mbox series

[for-4.0,0/6] vhost-user-blk: Add support for backend reconnecting

Message ID 20181206063552.6701-1-xieyongji@baidu.com (mailing list archive)
Headers show
Series vhost-user-blk: Add support for backend reconnecting | expand

Message

Yongji Xie Dec. 6, 2018, 6:35 a.m. UTC
From: Xie Yongji <xieyongji@baidu.com>

This patchset is aimed at supporting qemu to reconnect
vhost-user-blk backend after vhost-user-blk backend crash or
restart.

The patch 1 tries to implenment the sync connection for
"reconnect socket".

The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
to support offering shared memory to backend to record
its inflight I/O.

The patch 3,4 are the corresponding libvhost-user patches of
patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.

The patch 5 supports vhost-user-blk to reconnect backend when
connection closed.

The patch 6 tells qemu that we support reconnecting now.

To use it, we could start qemu with:

qemu-system-x86_64 \
        -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
        -device vhost-user-blk-pci,chardev=char0 \

and start vhost-user-blk backend with:

vhost-user-blk -b /path/file -s /path/vhost.socket

Then we can restart vhost-user-blk at any time during VM running.

Xie Yongji (6):
  char-socket: Enable "wait" option for client mode
  vhost-user: Add shared memory to record inflight I/O
  libvhost-user: Introduce vu_queue_map_desc()
  libvhost-user: Support recording inflight I/O in shared memory
  vhost-user-blk: Add support for reconnecting backend
  contrib/vhost-user-blk: enable inflight I/O recording

 chardev/char-socket.c                   |   5 +-
 contrib/libvhost-user/libvhost-user.c   | 215 ++++++++++++++++++++----
 contrib/libvhost-user/libvhost-user.h   |  19 +++
 contrib/vhost-user-blk/vhost-user-blk.c |   3 +-
 hw/block/vhost-user-blk.c               | 169 +++++++++++++++++--
 hw/virtio/vhost-user.c                  |  69 ++++++++
 hw/virtio/vhost.c                       |   8 +
 include/hw/virtio/vhost-backend.h       |   4 +
 include/hw/virtio/vhost-user-blk.h      |   4 +
 include/hw/virtio/vhost-user.h          |   8 +
 10 files changed, 452 insertions(+), 52 deletions(-)

Comments

Marc-André Lureau Dec. 6, 2018, 7:23 a.m. UTC | #1
Hi

On Thu, Dec 6, 2018 at 10:36 AM <elohimes@gmail.com> wrote:
>
> From: Xie Yongji <xieyongji@baidu.com>
>
> This patchset is aimed at supporting qemu to reconnect
> vhost-user-blk backend after vhost-user-blk backend crash or
> restart.
>
> The patch 1 tries to implenment the sync connection for
> "reconnect socket".
>
> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> to support offering shared memory to backend to record
> its inflight I/O.
>
> The patch 3,4 are the corresponding libvhost-user patches of
> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
>
> The patch 5 supports vhost-user-blk to reconnect backend when
> connection closed.
>
> The patch 6 tells qemu that we support reconnecting now.
>
> To use it, we could start qemu with:
>
> qemu-system-x86_64 \
>         -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
>         -device vhost-user-blk-pci,chardev=char0 \

Why do you want qemu to be the client since it is actually the one
that serves and remains alive?  Why make it try to reconnect regularly
when it could instead wait for a connection to come up?

>
> and start vhost-user-blk backend with:
>
> vhost-user-blk -b /path/file -s /path/vhost.socket
>
> Then we can restart vhost-user-blk at any time during VM running.
>
> Xie Yongji (6):
>   char-socket: Enable "wait" option for client mode
>   vhost-user: Add shared memory to record inflight I/O
>   libvhost-user: Introduce vu_queue_map_desc()
>   libvhost-user: Support recording inflight I/O in shared memory
>   vhost-user-blk: Add support for reconnecting backend
>   contrib/vhost-user-blk: enable inflight I/O recording
>
>  chardev/char-socket.c                   |   5 +-
>  contrib/libvhost-user/libvhost-user.c   | 215 ++++++++++++++++++++----
>  contrib/libvhost-user/libvhost-user.h   |  19 +++
>  contrib/vhost-user-blk/vhost-user-blk.c |   3 +-
>  hw/block/vhost-user-blk.c               | 169 +++++++++++++++++--
>  hw/virtio/vhost-user.c                  |  69 ++++++++
>  hw/virtio/vhost.c                       |   8 +
>  include/hw/virtio/vhost-backend.h       |   4 +
>  include/hw/virtio/vhost-user-blk.h      |   4 +
>  include/hw/virtio/vhost-user.h          |   8 +
>  10 files changed, 452 insertions(+), 52 deletions(-)
>
> --
> 2.17.1
>
>
Yongji Xie Dec. 6, 2018, 7:43 a.m. UTC | #2
On Thu, 6 Dec 2018 at 15:23, Marc-André Lureau
<marcandre.lureau@gmail.com> wrote:
>
> Hi
>
> On Thu, Dec 6, 2018 at 10:36 AM <elohimes@gmail.com> wrote:
> >
> > From: Xie Yongji <xieyongji@baidu.com>
> >
> > This patchset is aimed at supporting qemu to reconnect
> > vhost-user-blk backend after vhost-user-blk backend crash or
> > restart.
> >
> > The patch 1 tries to implenment the sync connection for
> > "reconnect socket".
> >
> > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> > to support offering shared memory to backend to record
> > its inflight I/O.
> >
> > The patch 3,4 are the corresponding libvhost-user patches of
> > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> >
> > The patch 5 supports vhost-user-blk to reconnect backend when
> > connection closed.
> >
> > The patch 6 tells qemu that we support reconnecting now.
> >
> > To use it, we could start qemu with:
> >
> > qemu-system-x86_64 \
> >         -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> >         -device vhost-user-blk-pci,chardev=char0 \
>
> Why do you want qemu to be the client since it is actually the one
> that serves and remains alive?  Why make it try to reconnect regularly
> when it could instead wait for a connection to come up?
>

Actually, this patchset should also work when qemu is in server mode.
The reason I make qemu to be client is that some vhost-user backend
such as spdk, vhost-user-blk may still work in server mode. And
seems like we could not make sure all vhost-user backend is working in
client mode.

Thanks,
Yongji
Yury Kotov Dec. 6, 2018, 9:21 a.m. UTC | #3
Hi, it's very interesting patchset.

I also research reconnecting issue for vhost-user-blk and SPDK.
Did you support a case when vhost backend is not started but QEMU does?

Regards,
Yury

06.12.2018, 09:37, "elohimes@gmail.com" <elohimes@gmail.com>:
> From: Xie Yongji <xieyongji@baidu.com>
>
> This patchset is aimed at supporting qemu to reconnect
> vhost-user-blk backend after vhost-user-blk backend crash or
> restart.
>
> The patch 1 tries to implenment the sync connection for
> "reconnect socket".
>
> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> to support offering shared memory to backend to record
> its inflight I/O.
>
> The patch 3,4 are the corresponding libvhost-user patches of
> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
>
> The patch 5 supports vhost-user-blk to reconnect backend when
> connection closed.
>
> The patch 6 tells qemu that we support reconnecting now.
>
> To use it, we could start qemu with:
>
> qemu-system-x86_64 \
>         -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
>         -device vhost-user-blk-pci,chardev=char0 \
>
> and start vhost-user-blk backend with:
>
> vhost-user-blk -b /path/file -s /path/vhost.socket
>
> Then we can restart vhost-user-blk at any time during VM running.
>
> Xie Yongji (6):
>   char-socket: Enable "wait" option for client mode
>   vhost-user: Add shared memory to record inflight I/O
>   libvhost-user: Introduce vu_queue_map_desc()
>   libvhost-user: Support recording inflight I/O in shared memory
>   vhost-user-blk: Add support for reconnecting backend
>   contrib/vhost-user-blk: enable inflight I/O recording
>
>  chardev/char-socket.c | 5 +-
>  contrib/libvhost-user/libvhost-user.c | 215 ++++++++++++++++++++----
>  contrib/libvhost-user/libvhost-user.h | 19 +++
>  contrib/vhost-user-blk/vhost-user-blk.c | 3 +-
>  hw/block/vhost-user-blk.c | 169 +++++++++++++++++--
>  hw/virtio/vhost-user.c | 69 ++++++++
>  hw/virtio/vhost.c | 8 +
>  include/hw/virtio/vhost-backend.h | 4 +
>  include/hw/virtio/vhost-user-blk.h | 4 +
>  include/hw/virtio/vhost-user.h | 8 +
>  10 files changed, 452 insertions(+), 52 deletions(-)
>
> --
> 2.17.1
Yongji Xie Dec. 6, 2018, 9:41 a.m. UTC | #4
On Thu, 6 Dec 2018 at 17:21, Yury Kotov <yury-kotov@yandex-team.ru> wrote:
>
> Hi, it's very interesting patchset.
>
> I also research reconnecting issue for vhost-user-blk and SPDK.
> Did you support a case when vhost backend is not started but QEMU does?
>

Now we do not support this case. Because qemu have to get config from vhost-user
backend through VHOST_USER_GET_CONFIG when we initialize the
vhost-user-blk device.

If we delay getting config until we connect vhost-user backend, guest
driver may get incorrect
configuration? That's why I modify the "wait" option to support client mode.

Thanks,
Yongji
Yury Kotov Dec. 6, 2018, 9:52 a.m. UTC | #5
Yes, I also think that realize shout be sync.

But may be it's better to add an 'disconnected' option to init the chardev
in disconnected state, then do the first connection with
qemu_chr_fe_wait_connected from vhost_user_blk_realize. So when
connection will be broken in realize we can try again.
What do you think about it?

Regards,
Yury

06.12.2018, 12:42, "Yongji Xie" <elohimes@gmail.com>:
> On Thu, 6 Dec 2018 at 17:21, Yury Kotov <yury-kotov@yandex-team.ru> wrote:
>>  Hi, it's very interesting patchset.
>>
>>  I also research reconnecting issue for vhost-user-blk and SPDK.
>>  Did you support a case when vhost backend is not started but QEMU does?
>
> Now we do not support this case. Because qemu have to get config from vhost-user
> backend through VHOST_USER_GET_CONFIG when we initialize the
> vhost-user-blk device.
>
> If we delay getting config until we connect vhost-user backend, guest
> driver may get incorrect
> configuration? That's why I modify the "wait" option to support client mode.
>
> Thanks,
> Yongji
Yongji Xie Dec. 6, 2018, 10:35 a.m. UTC | #6
On Thu, 6 Dec 2018 at 17:52, Yury Kotov <yury-kotov@yandex-team.ru> wrote:
>
> Yes, I also think that realize shout be sync.
>
> But may be it's better to add an 'disconnected' option to init the chardev
> in disconnected state, then do the first connection with
> qemu_chr_fe_wait_connected from vhost_user_blk_realize. So when
> connection will be broken in realize we can try again.
> What do you think about it?
>

Sounds good to me. And maybe we need to add a timeout for the retrying.

Thanks,
Yongji
Jason Wang Dec. 6, 2018, 1:57 p.m. UTC | #7
On 2018/12/6 下午2:35, elohimes@gmail.com wrote:
> From: Xie Yongji <xieyongji@baidu.com>
>
> This patchset is aimed at supporting qemu to reconnect
> vhost-user-blk backend after vhost-user-blk backend crash or
> restart.
>
> The patch 1 tries to implenment the sync connection for
> "reconnect socket".
>
> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> to support offering shared memory to backend to record
> its inflight I/O.
>
> The patch 3,4 are the corresponding libvhost-user patches of
> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
>
> The patch 5 supports vhost-user-blk to reconnect backend when
> connection closed.
>
> The patch 6 tells qemu that we support reconnecting now.
>
> To use it, we could start qemu with:
>
> qemu-system-x86_64 \
>          -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
>          -device vhost-user-blk-pci,chardev=char0 \
>
> and start vhost-user-blk backend with:
>
> vhost-user-blk -b /path/file -s /path/vhost.socket
>
> Then we can restart vhost-user-blk at any time during VM running.


I wonder whether or not it's better to handle this at the level of 
virtio protocol itself instead of vhost-user level. E.g expose 
last_avail_idx to driver might be sufficient?

Another possible issue is, looks like you need to deal with different 
kinds of ring layouts e.g packed virtqueues.

Thanks


>
> Xie Yongji (6):
>    char-socket: Enable "wait" option for client mode
>    vhost-user: Add shared memory to record inflight I/O
>    libvhost-user: Introduce vu_queue_map_desc()
>    libvhost-user: Support recording inflight I/O in shared memory
>    vhost-user-blk: Add support for reconnecting backend
>    contrib/vhost-user-blk: enable inflight I/O recording
>
>   chardev/char-socket.c                   |   5 +-
>   contrib/libvhost-user/libvhost-user.c   | 215 ++++++++++++++++++++----
>   contrib/libvhost-user/libvhost-user.h   |  19 +++
>   contrib/vhost-user-blk/vhost-user-blk.c |   3 +-
>   hw/block/vhost-user-blk.c               | 169 +++++++++++++++++--
>   hw/virtio/vhost-user.c                  |  69 ++++++++
>   hw/virtio/vhost.c                       |   8 +
>   include/hw/virtio/vhost-backend.h       |   4 +
>   include/hw/virtio/vhost-user-blk.h      |   4 +
>   include/hw/virtio/vhost-user.h          |   8 +
>   10 files changed, 452 insertions(+), 52 deletions(-)
>
Michael S. Tsirkin Dec. 6, 2018, 1:59 p.m. UTC | #8
On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote:
> 
> On 2018/12/6 下午2:35, elohimes@gmail.com wrote:
> > From: Xie Yongji <xieyongji@baidu.com>
> > 
> > This patchset is aimed at supporting qemu to reconnect
> > vhost-user-blk backend after vhost-user-blk backend crash or
> > restart.
> > 
> > The patch 1 tries to implenment the sync connection for
> > "reconnect socket".
> > 
> > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> > to support offering shared memory to backend to record
> > its inflight I/O.
> > 
> > The patch 3,4 are the corresponding libvhost-user patches of
> > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> > 
> > The patch 5 supports vhost-user-blk to reconnect backend when
> > connection closed.
> > 
> > The patch 6 tells qemu that we support reconnecting now.
> > 
> > To use it, we could start qemu with:
> > 
> > qemu-system-x86_64 \
> >          -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> >          -device vhost-user-blk-pci,chardev=char0 \
> > 
> > and start vhost-user-blk backend with:
> > 
> > vhost-user-blk -b /path/file -s /path/vhost.socket
> > 
> > Then we can restart vhost-user-blk at any time during VM running.
> 
> 
> I wonder whether or not it's better to handle this at the level of virtio
> protocol itself instead of vhost-user level. E.g expose last_avail_idx to
> driver might be sufficient?
> 
> Another possible issue is, looks like you need to deal with different kinds
> of ring layouts e.g packed virtqueues.
> 
> Thanks

I'm not sure I understand your comments here.
All these would be guest-visible extensions.
Possible for sure but how is this related to
a patch supporting transparent reconnects?

> 
> > 
> > Xie Yongji (6):
> >    char-socket: Enable "wait" option for client mode
> >    vhost-user: Add shared memory to record inflight I/O
> >    libvhost-user: Introduce vu_queue_map_desc()
> >    libvhost-user: Support recording inflight I/O in shared memory
> >    vhost-user-blk: Add support for reconnecting backend
> >    contrib/vhost-user-blk: enable inflight I/O recording
> > 
> >   chardev/char-socket.c                   |   5 +-
> >   contrib/libvhost-user/libvhost-user.c   | 215 ++++++++++++++++++++----
> >   contrib/libvhost-user/libvhost-user.h   |  19 +++
> >   contrib/vhost-user-blk/vhost-user-blk.c |   3 +-
> >   hw/block/vhost-user-blk.c               | 169 +++++++++++++++++--
> >   hw/virtio/vhost-user.c                  |  69 ++++++++
> >   hw/virtio/vhost.c                       |   8 +
> >   include/hw/virtio/vhost-backend.h       |   4 +
> >   include/hw/virtio/vhost-user-blk.h      |   4 +
> >   include/hw/virtio/vhost-user.h          |   8 +
> >   10 files changed, 452 insertions(+), 52 deletions(-)
> >
Jason Wang Dec. 6, 2018, 2 p.m. UTC | #9
On 2018/12/6 下午9:57, Jason Wang wrote:
>
> On 2018/12/6 下午2:35, elohimes@gmail.com wrote:
>> From: Xie Yongji <xieyongji@baidu.com>
>>
>> This patchset is aimed at supporting qemu to reconnect
>> vhost-user-blk backend after vhost-user-blk backend crash or
>> restart.
>>
>> The patch 1 tries to implenment the sync connection for
>> "reconnect socket".
>>
>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
>> to support offering shared memory to backend to record
>> its inflight I/O.
>>
>> The patch 3,4 are the corresponding libvhost-user patches of
>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
>>
>> The patch 5 supports vhost-user-blk to reconnect backend when
>> connection closed.
>>
>> The patch 6 tells qemu that we support reconnecting now.
>>
>> To use it, we could start qemu with:
>>
>> qemu-system-x86_64 \
>>          -chardev 
>> socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
>>          -device vhost-user-blk-pci,chardev=char0 \
>>
>> and start vhost-user-blk backend with:
>>
>> vhost-user-blk -b /path/file -s /path/vhost.socket
>>
>> Then we can restart vhost-user-blk at any time during VM running.
>
>
> I wonder whether or not it's better to handle this at the level of 
> virtio protocol itself instead of vhost-user level. E.g expose 
> last_avail_idx to driver might be sufficient?
>
> Another possible issue is, looks like you need to deal with different 
> kinds of ring layouts e.g packed virtqueues.
>
> Thanks 


Cc Maxime who wrote vhost-user reconnecting for more thoughts.

Thanks
Yongji Xie Dec. 7, 2018, 8:56 a.m. UTC | #10
On Thu, 6 Dec 2018 at 21:57, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2018/12/6 下午2:35, elohimes@gmail.com wrote:
> > From: Xie Yongji <xieyongji@baidu.com>
> >
> > This patchset is aimed at supporting qemu to reconnect
> > vhost-user-blk backend after vhost-user-blk backend crash or
> > restart.
> >
> > The patch 1 tries to implenment the sync connection for
> > "reconnect socket".
> >
> > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> > to support offering shared memory to backend to record
> > its inflight I/O.
> >
> > The patch 3,4 are the corresponding libvhost-user patches of
> > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> >
> > The patch 5 supports vhost-user-blk to reconnect backend when
> > connection closed.
> >
> > The patch 6 tells qemu that we support reconnecting now.
> >
> > To use it, we could start qemu with:
> >
> > qemu-system-x86_64 \
> >          -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> >          -device vhost-user-blk-pci,chardev=char0 \
> >
> > and start vhost-user-blk backend with:
> >
> > vhost-user-blk -b /path/file -s /path/vhost.socket
> >
> > Then we can restart vhost-user-blk at any time during VM running.
>
>
> I wonder whether or not it's better to handle this at the level of
> virtio protocol itself instead of vhost-user level. E.g expose
> last_avail_idx to driver might be sufficient?
>
> Another possible issue is, looks like you need to deal with different
> kinds of ring layouts e.g packed virtqueues.
>

Yes, we should be able to deal with the packed virtqueues. But this
should be processed in backend rather than in qemu. Qemu just sends
a shared memory to backend. Then backend use it to record inflight I/O
and do I/O replay when necessary.

Thanks,
Yongji
Jason Wang Dec. 10, 2018, 9:32 a.m. UTC | #11
On 2018/12/6 下午9:59, Michael S. Tsirkin wrote:
> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote:
>> On 2018/12/6 下午2:35,elohimes@gmail.com  wrote:
>>> From: Xie Yongji<xieyongji@baidu.com>
>>>
>>> This patchset is aimed at supporting qemu to reconnect
>>> vhost-user-blk backend after vhost-user-blk backend crash or
>>> restart.
>>>
>>> The patch 1 tries to implenment the sync connection for
>>> "reconnect socket".
>>>
>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
>>> to support offering shared memory to backend to record
>>> its inflight I/O.
>>>
>>> The patch 3,4 are the corresponding libvhost-user patches of
>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
>>>
>>> The patch 5 supports vhost-user-blk to reconnect backend when
>>> connection closed.
>>>
>>> The patch 6 tells qemu that we support reconnecting now.
>>>
>>> To use it, we could start qemu with:
>>>
>>> qemu-system-x86_64 \
>>>           -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
>>>           -device vhost-user-blk-pci,chardev=char0 \
>>>
>>> and start vhost-user-blk backend with:
>>>
>>> vhost-user-blk -b /path/file -s /path/vhost.socket
>>>
>>> Then we can restart vhost-user-blk at any time during VM running.
>> I wonder whether or not it's better to handle this at the level of virtio
>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to
>> driver might be sufficient?
>>
>> Another possible issue is, looks like you need to deal with different kinds
>> of ring layouts e.g packed virtqueues.
>>
>> Thanks
> I'm not sure I understand your comments here.
> All these would be guest-visible extensions.


Looks not, it only introduces a shared memory between qemu and 
vhost-user backend?


> Possible for sure but how is this related to
> a patch supporting transparent reconnects?


I might miss something. My understanding is that we support transparent 
reconnects, but we can't deduce an accurate last_avail_idx and this is 
what capability this series try to add. To me, this series is functional 
equivalent to expose last_avail_idx (or avail_idx_cons) in available 
ring. So the information is inside guest memory, vhost-user backend can 
access it and update it directly. I believe this is some modern NIC did 
as well (but index is in MMIO area of course).

Thanks
Yongji Xie Dec. 12, 2018, 2:48 a.m. UTC | #12
On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote:
> > On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote:
> >> On 2018/12/6 下午2:35,elohimes@gmail.com  wrote:
> >>> From: Xie Yongji<xieyongji@baidu.com>
> >>>
> >>> This patchset is aimed at supporting qemu to reconnect
> >>> vhost-user-blk backend after vhost-user-blk backend crash or
> >>> restart.
> >>>
> >>> The patch 1 tries to implenment the sync connection for
> >>> "reconnect socket".
> >>>
> >>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> >>> to support offering shared memory to backend to record
> >>> its inflight I/O.
> >>>
> >>> The patch 3,4 are the corresponding libvhost-user patches of
> >>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> >>>
> >>> The patch 5 supports vhost-user-blk to reconnect backend when
> >>> connection closed.
> >>>
> >>> The patch 6 tells qemu that we support reconnecting now.
> >>>
> >>> To use it, we could start qemu with:
> >>>
> >>> qemu-system-x86_64 \
> >>>           -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> >>>           -device vhost-user-blk-pci,chardev=char0 \
> >>>
> >>> and start vhost-user-blk backend with:
> >>>
> >>> vhost-user-blk -b /path/file -s /path/vhost.socket
> >>>
> >>> Then we can restart vhost-user-blk at any time during VM running.
> >> I wonder whether or not it's better to handle this at the level of virtio
> >> protocol itself instead of vhost-user level. E.g expose last_avail_idx to
> >> driver might be sufficient?
> >>
> >> Another possible issue is, looks like you need to deal with different kinds
> >> of ring layouts e.g packed virtqueues.
> >>
> >> Thanks
> > I'm not sure I understand your comments here.
> > All these would be guest-visible extensions.
>
>
> Looks not, it only introduces a shared memory between qemu and
> vhost-user backend?
>
>
> > Possible for sure but how is this related to
> > a patch supporting transparent reconnects?
>
>
> I might miss something. My understanding is that we support transparent
> reconnects, but we can't deduce an accurate last_avail_idx and this is
> what capability this series try to add. To me, this series is functional
> equivalent to expose last_avail_idx (or avail_idx_cons) in available
> ring. So the information is inside guest memory, vhost-user backend can
> access it and update it directly. I believe this is some modern NIC did
> as well (but index is in MMIO area of course).
>

Hi Jason,

If my understand is correct, it might be not enough to only expose
last_avail_idx.
Because we would not process descriptors in the same order in which they have
been made available sometimes. If so, we can't get correct inflight
I/O information
from available ring.

Thanks,
Yongji
Jason Wang Dec. 12, 2018, 3 a.m. UTC | #13
On 2018/12/12 上午10:48, Yongji Xie wrote:
> On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote:
>>> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote:
>>>> On 2018/12/6 下午2:35,elohimes@gmail.com  wrote:
>>>>> From: Xie Yongji<xieyongji@baidu.com>
>>>>>
>>>>> This patchset is aimed at supporting qemu to reconnect
>>>>> vhost-user-blk backend after vhost-user-blk backend crash or
>>>>> restart.
>>>>>
>>>>> The patch 1 tries to implenment the sync connection for
>>>>> "reconnect socket".
>>>>>
>>>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
>>>>> to support offering shared memory to backend to record
>>>>> its inflight I/O.
>>>>>
>>>>> The patch 3,4 are the corresponding libvhost-user patches of
>>>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
>>>>>
>>>>> The patch 5 supports vhost-user-blk to reconnect backend when
>>>>> connection closed.
>>>>>
>>>>> The patch 6 tells qemu that we support reconnecting now.
>>>>>
>>>>> To use it, we could start qemu with:
>>>>>
>>>>> qemu-system-x86_64 \
>>>>>            -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
>>>>>            -device vhost-user-blk-pci,chardev=char0 \
>>>>>
>>>>> and start vhost-user-blk backend with:
>>>>>
>>>>> vhost-user-blk -b /path/file -s /path/vhost.socket
>>>>>
>>>>> Then we can restart vhost-user-blk at any time during VM running.
>>>> I wonder whether or not it's better to handle this at the level of virtio
>>>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to
>>>> driver might be sufficient?
>>>>
>>>> Another possible issue is, looks like you need to deal with different kinds
>>>> of ring layouts e.g packed virtqueues.
>>>>
>>>> Thanks
>>> I'm not sure I understand your comments here.
>>> All these would be guest-visible extensions.
>>
>> Looks not, it only introduces a shared memory between qemu and
>> vhost-user backend?
>>
>>
>>> Possible for sure but how is this related to
>>> a patch supporting transparent reconnects?
>>
>> I might miss something. My understanding is that we support transparent
>> reconnects, but we can't deduce an accurate last_avail_idx and this is
>> what capability this series try to add. To me, this series is functional
>> equivalent to expose last_avail_idx (or avail_idx_cons) in available
>> ring. So the information is inside guest memory, vhost-user backend can
>> access it and update it directly. I believe this is some modern NIC did
>> as well (but index is in MMIO area of course).
>>
> Hi Jason,
>
> If my understand is correct, it might be not enough to only expose
> last_avail_idx.
> Because we would not process descriptors in the same order in which they have
> been made available sometimes. If so, we can't get correct inflight
> I/O information
> from available ring.


You can get this with the help of the both used ring and last_avail_idx 
I believe. Or maybe you can give us an example?

Thanks


>
> Thanks,
> Yongji
Yongji Xie Dec. 12, 2018, 3:21 a.m. UTC | #14
On Wed, 12 Dec 2018 at 11:00, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2018/12/12 上午10:48, Yongji Xie wrote:
> > On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote:
> >>> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote:
> >>>> On 2018/12/6 下午2:35,elohimes@gmail.com  wrote:
> >>>>> From: Xie Yongji<xieyongji@baidu.com>
> >>>>>
> >>>>> This patchset is aimed at supporting qemu to reconnect
> >>>>> vhost-user-blk backend after vhost-user-blk backend crash or
> >>>>> restart.
> >>>>>
> >>>>> The patch 1 tries to implenment the sync connection for
> >>>>> "reconnect socket".
> >>>>>
> >>>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> >>>>> to support offering shared memory to backend to record
> >>>>> its inflight I/O.
> >>>>>
> >>>>> The patch 3,4 are the corresponding libvhost-user patches of
> >>>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> >>>>>
> >>>>> The patch 5 supports vhost-user-blk to reconnect backend when
> >>>>> connection closed.
> >>>>>
> >>>>> The patch 6 tells qemu that we support reconnecting now.
> >>>>>
> >>>>> To use it, we could start qemu with:
> >>>>>
> >>>>> qemu-system-x86_64 \
> >>>>>            -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> >>>>>            -device vhost-user-blk-pci,chardev=char0 \
> >>>>>
> >>>>> and start vhost-user-blk backend with:
> >>>>>
> >>>>> vhost-user-blk -b /path/file -s /path/vhost.socket
> >>>>>
> >>>>> Then we can restart vhost-user-blk at any time during VM running.
> >>>> I wonder whether or not it's better to handle this at the level of virtio
> >>>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to
> >>>> driver might be sufficient?
> >>>>
> >>>> Another possible issue is, looks like you need to deal with different kinds
> >>>> of ring layouts e.g packed virtqueues.
> >>>>
> >>>> Thanks
> >>> I'm not sure I understand your comments here.
> >>> All these would be guest-visible extensions.
> >>
> >> Looks not, it only introduces a shared memory between qemu and
> >> vhost-user backend?
> >>
> >>
> >>> Possible for sure but how is this related to
> >>> a patch supporting transparent reconnects?
> >>
> >> I might miss something. My understanding is that we support transparent
> >> reconnects, but we can't deduce an accurate last_avail_idx and this is
> >> what capability this series try to add. To me, this series is functional
> >> equivalent to expose last_avail_idx (or avail_idx_cons) in available
> >> ring. So the information is inside guest memory, vhost-user backend can
> >> access it and update it directly. I believe this is some modern NIC did
> >> as well (but index is in MMIO area of course).
> >>
> > Hi Jason,
> >
> > If my understand is correct, it might be not enough to only expose
> > last_avail_idx.
> > Because we would not process descriptors in the same order in which they have
> > been made available sometimes. If so, we can't get correct inflight
> > I/O information
> > from available ring.
>
>
> You can get this with the help of the both used ring and last_avail_idx
> I believe. Or maybe you can give us an example?
>

A simple example, we assume ring size is 8:

1. guest fill avail ring

avail ring: 0 1 2 3 4 5 6 7
used ring:

2. vhost-user backend complete 4,5,6,7 and fill used ring

avail ring: 0 1 2 3 4 5 6 7
used ring: 4 5 6 7

3. guest fill avail ring again

avail ring: 4 5 6 7 4 5 6 7
used ring: 4 5 6 7

4. vhost-user backend crash

The inflight descriptors 0, 1, 2, 3 lost.

Thanks,
Yongji
Jason Wang Dec. 12, 2018, 4:06 a.m. UTC | #15
On 2018/12/12 上午11:21, Yongji Xie wrote:
> On Wed, 12 Dec 2018 at 11:00, Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2018/12/12 上午10:48, Yongji Xie wrote:
>>> On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote:
>>>> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote:
>>>>> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote:
>>>>>> On 2018/12/6 下午2:35,elohimes@gmail.com  wrote:
>>>>>>> From: Xie Yongji<xieyongji@baidu.com>
>>>>>>>
>>>>>>> This patchset is aimed at supporting qemu to reconnect
>>>>>>> vhost-user-blk backend after vhost-user-blk backend crash or
>>>>>>> restart.
>>>>>>>
>>>>>>> The patch 1 tries to implenment the sync connection for
>>>>>>> "reconnect socket".
>>>>>>>
>>>>>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
>>>>>>> to support offering shared memory to backend to record
>>>>>>> its inflight I/O.
>>>>>>>
>>>>>>> The patch 3,4 are the corresponding libvhost-user patches of
>>>>>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
>>>>>>>
>>>>>>> The patch 5 supports vhost-user-blk to reconnect backend when
>>>>>>> connection closed.
>>>>>>>
>>>>>>> The patch 6 tells qemu that we support reconnecting now.
>>>>>>>
>>>>>>> To use it, we could start qemu with:
>>>>>>>
>>>>>>> qemu-system-x86_64 \
>>>>>>>             -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
>>>>>>>             -device vhost-user-blk-pci,chardev=char0 \
>>>>>>>
>>>>>>> and start vhost-user-blk backend with:
>>>>>>>
>>>>>>> vhost-user-blk -b /path/file -s /path/vhost.socket
>>>>>>>
>>>>>>> Then we can restart vhost-user-blk at any time during VM running.
>>>>>> I wonder whether or not it's better to handle this at the level of virtio
>>>>>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to
>>>>>> driver might be sufficient?
>>>>>>
>>>>>> Another possible issue is, looks like you need to deal with different kinds
>>>>>> of ring layouts e.g packed virtqueues.
>>>>>>
>>>>>> Thanks
>>>>> I'm not sure I understand your comments here.
>>>>> All these would be guest-visible extensions.
>>>> Looks not, it only introduces a shared memory between qemu and
>>>> vhost-user backend?
>>>>
>>>>
>>>>> Possible for sure but how is this related to
>>>>> a patch supporting transparent reconnects?
>>>> I might miss something. My understanding is that we support transparent
>>>> reconnects, but we can't deduce an accurate last_avail_idx and this is
>>>> what capability this series try to add. To me, this series is functional
>>>> equivalent to expose last_avail_idx (or avail_idx_cons) in available
>>>> ring. So the information is inside guest memory, vhost-user backend can
>>>> access it and update it directly. I believe this is some modern NIC did
>>>> as well (but index is in MMIO area of course).
>>>>
>>> Hi Jason,
>>>
>>> If my understand is correct, it might be not enough to only expose
>>> last_avail_idx.
>>> Because we would not process descriptors in the same order in which they have
>>> been made available sometimes. If so, we can't get correct inflight
>>> I/O information
>>> from available ring.
>>
>> You can get this with the help of the both used ring and last_avail_idx
>> I believe. Or maybe you can give us an example?
>>
> A simple example, we assume ring size is 8:
>
> 1. guest fill avail ring
>
> avail ring: 0 1 2 3 4 5 6 7
> used ring:
>
> 2. vhost-user backend complete 4,5,6,7 and fill used ring
>
> avail ring: 0 1 2 3 4 5 6 7
> used ring: 4 5 6 7
>
> 3. guest fill avail ring again
>
> avail ring: 4 5 6 7 4 5 6 7
> used ring: 4 5 6 7
>
> 4. vhost-user backend crash
>
> The inflight descriptors 0, 1, 2, 3 lost.
>
> Thanks,
> Yongji


Ok, then we can simply forbid increasing the avail_idx in this case?

Basically, it's a question of whether or not it's better to done it in 
the level of virtio instead of vhost. I'm pretty sure if we expose 
sufficient information, it could be done without touching vhost-user. 
And we won't deal with e.g migration and other cases.

Thanks
Yongji Xie Dec. 12, 2018, 6:41 a.m. UTC | #16
On Wed, 12 Dec 2018 at 12:07, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2018/12/12 上午11:21, Yongji Xie wrote:
> > On Wed, 12 Dec 2018 at 11:00, Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> On 2018/12/12 上午10:48, Yongji Xie wrote:
> >>> On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote:
> >>>> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote:
> >>>>> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote:
> >>>>>> On 2018/12/6 下午2:35,elohimes@gmail.com  wrote:
> >>>>>>> From: Xie Yongji<xieyongji@baidu.com>
> >>>>>>>
> >>>>>>> This patchset is aimed at supporting qemu to reconnect
> >>>>>>> vhost-user-blk backend after vhost-user-blk backend crash or
> >>>>>>> restart.
> >>>>>>>
> >>>>>>> The patch 1 tries to implenment the sync connection for
> >>>>>>> "reconnect socket".
> >>>>>>>
> >>>>>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> >>>>>>> to support offering shared memory to backend to record
> >>>>>>> its inflight I/O.
> >>>>>>>
> >>>>>>> The patch 3,4 are the corresponding libvhost-user patches of
> >>>>>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> >>>>>>>
> >>>>>>> The patch 5 supports vhost-user-blk to reconnect backend when
> >>>>>>> connection closed.
> >>>>>>>
> >>>>>>> The patch 6 tells qemu that we support reconnecting now.
> >>>>>>>
> >>>>>>> To use it, we could start qemu with:
> >>>>>>>
> >>>>>>> qemu-system-x86_64 \
> >>>>>>>             -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> >>>>>>>             -device vhost-user-blk-pci,chardev=char0 \
> >>>>>>>
> >>>>>>> and start vhost-user-blk backend with:
> >>>>>>>
> >>>>>>> vhost-user-blk -b /path/file -s /path/vhost.socket
> >>>>>>>
> >>>>>>> Then we can restart vhost-user-blk at any time during VM running.
> >>>>>> I wonder whether or not it's better to handle this at the level of virtio
> >>>>>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to
> >>>>>> driver might be sufficient?
> >>>>>>
> >>>>>> Another possible issue is, looks like you need to deal with different kinds
> >>>>>> of ring layouts e.g packed virtqueues.
> >>>>>>
> >>>>>> Thanks
> >>>>> I'm not sure I understand your comments here.
> >>>>> All these would be guest-visible extensions.
> >>>> Looks not, it only introduces a shared memory between qemu and
> >>>> vhost-user backend?
> >>>>
> >>>>
> >>>>> Possible for sure but how is this related to
> >>>>> a patch supporting transparent reconnects?
> >>>> I might miss something. My understanding is that we support transparent
> >>>> reconnects, but we can't deduce an accurate last_avail_idx and this is
> >>>> what capability this series try to add. To me, this series is functional
> >>>> equivalent to expose last_avail_idx (or avail_idx_cons) in available
> >>>> ring. So the information is inside guest memory, vhost-user backend can
> >>>> access it and update it directly. I believe this is some modern NIC did
> >>>> as well (but index is in MMIO area of course).
> >>>>
> >>> Hi Jason,
> >>>
> >>> If my understand is correct, it might be not enough to only expose
> >>> last_avail_idx.
> >>> Because we would not process descriptors in the same order in which they have
> >>> been made available sometimes. If so, we can't get correct inflight
> >>> I/O information
> >>> from available ring.
> >>
> >> You can get this with the help of the both used ring and last_avail_idx
> >> I believe. Or maybe you can give us an example?
> >>
> > A simple example, we assume ring size is 8:
> >
> > 1. guest fill avail ring
> >
> > avail ring: 0 1 2 3 4 5 6 7
> > used ring:
> >
> > 2. vhost-user backend complete 4,5,6,7 and fill used ring
> >
> > avail ring: 0 1 2 3 4 5 6 7
> > used ring: 4 5 6 7
> >
> > 3. guest fill avail ring again
> >
> > avail ring: 4 5 6 7 4 5 6 7
> > used ring: 4 5 6 7
> >
> > 4. vhost-user backend crash
> >
> > The inflight descriptors 0, 1, 2, 3 lost.
> >
> > Thanks,
> > Yongji
>
>
> Ok, then we can simply forbid increasing the avail_idx in this case?
>
> Basically, it's a question of whether or not it's better to done it in
> the level of virtio instead of vhost. I'm pretty sure if we expose
> sufficient information, it could be done without touching vhost-user.
> And we won't deal with e.g migration and other cases.
>

OK, I get your point. That's indeed an alternative way. But this feature seems
to be only useful to vhost-user backend. I'm not sure whether it make sense to
touch virtio protocol for this feature.

Thanks,
Yongji
Jason Wang Dec. 12, 2018, 7:47 a.m. UTC | #17
On 2018/12/12 下午2:41, Yongji Xie wrote:
> On Wed, 12 Dec 2018 at 12:07, Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2018/12/12 上午11:21, Yongji Xie wrote:
>>> On Wed, 12 Dec 2018 at 11:00, Jason Wang <jasowang@redhat.com> wrote:
>>>> On 2018/12/12 上午10:48, Yongji Xie wrote:
>>>>> On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote:
>>>>>> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote:
>>>>>>> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote:
>>>>>>>> On 2018/12/6 下午2:35,elohimes@gmail.com  wrote:
>>>>>>>>> From: Xie Yongji<xieyongji@baidu.com>
>>>>>>>>>
>>>>>>>>> This patchset is aimed at supporting qemu to reconnect
>>>>>>>>> vhost-user-blk backend after vhost-user-blk backend crash or
>>>>>>>>> restart.
>>>>>>>>>
>>>>>>>>> The patch 1 tries to implenment the sync connection for
>>>>>>>>> "reconnect socket".
>>>>>>>>>
>>>>>>>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
>>>>>>>>> to support offering shared memory to backend to record
>>>>>>>>> its inflight I/O.
>>>>>>>>>
>>>>>>>>> The patch 3,4 are the corresponding libvhost-user patches of
>>>>>>>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
>>>>>>>>>
>>>>>>>>> The patch 5 supports vhost-user-blk to reconnect backend when
>>>>>>>>> connection closed.
>>>>>>>>>
>>>>>>>>> The patch 6 tells qemu that we support reconnecting now.
>>>>>>>>>
>>>>>>>>> To use it, we could start qemu with:
>>>>>>>>>
>>>>>>>>> qemu-system-x86_64 \
>>>>>>>>>              -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
>>>>>>>>>              -device vhost-user-blk-pci,chardev=char0 \
>>>>>>>>>
>>>>>>>>> and start vhost-user-blk backend with:
>>>>>>>>>
>>>>>>>>> vhost-user-blk -b /path/file -s /path/vhost.socket
>>>>>>>>>
>>>>>>>>> Then we can restart vhost-user-blk at any time during VM running.
>>>>>>>> I wonder whether or not it's better to handle this at the level of virtio
>>>>>>>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to
>>>>>>>> driver might be sufficient?
>>>>>>>>
>>>>>>>> Another possible issue is, looks like you need to deal with different kinds
>>>>>>>> of ring layouts e.g packed virtqueues.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>> I'm not sure I understand your comments here.
>>>>>>> All these would be guest-visible extensions.
>>>>>> Looks not, it only introduces a shared memory between qemu and
>>>>>> vhost-user backend?
>>>>>>
>>>>>>
>>>>>>> Possible for sure but how is this related to
>>>>>>> a patch supporting transparent reconnects?
>>>>>> I might miss something. My understanding is that we support transparent
>>>>>> reconnects, but we can't deduce an accurate last_avail_idx and this is
>>>>>> what capability this series try to add. To me, this series is functional
>>>>>> equivalent to expose last_avail_idx (or avail_idx_cons) in available
>>>>>> ring. So the information is inside guest memory, vhost-user backend can
>>>>>> access it and update it directly. I believe this is some modern NIC did
>>>>>> as well (but index is in MMIO area of course).
>>>>>>
>>>>> Hi Jason,
>>>>>
>>>>> If my understand is correct, it might be not enough to only expose
>>>>> last_avail_idx.
>>>>> Because we would not process descriptors in the same order in which they have
>>>>> been made available sometimes. If so, we can't get correct inflight
>>>>> I/O information
>>>>> from available ring.
>>>> You can get this with the help of the both used ring and last_avail_idx
>>>> I believe. Or maybe you can give us an example?
>>>>
>>> A simple example, we assume ring size is 8:
>>>
>>> 1. guest fill avail ring
>>>
>>> avail ring: 0 1 2 3 4 5 6 7
>>> used ring:
>>>
>>> 2. vhost-user backend complete 4,5,6,7 and fill used ring
>>>
>>> avail ring: 0 1 2 3 4 5 6 7
>>> used ring: 4 5 6 7
>>>
>>> 3. guest fill avail ring again
>>>
>>> avail ring: 4 5 6 7 4 5 6 7
>>> used ring: 4 5 6 7
>>>
>>> 4. vhost-user backend crash
>>>
>>> The inflight descriptors 0, 1, 2, 3 lost.
>>>
>>> Thanks,
>>> Yongji
>>
>> Ok, then we can simply forbid increasing the avail_idx in this case?
>>
>> Basically, it's a question of whether or not it's better to done it in
>> the level of virtio instead of vhost. I'm pretty sure if we expose
>> sufficient information, it could be done without touching vhost-user.
>> And we won't deal with e.g migration and other cases.
>>
> OK, I get your point. That's indeed an alternative way. But this feature seems
> to be only useful to vhost-user backend.


I admit I could not think of a use case other than vhost-user.


>   I'm not sure whether it make sense to
> touch virtio protocol for this feature.


Some possible advantages:

- Feature could be determined and noticed by user or management layer.

- There's no need to invent ring layout specific protocol to record in 
flight descriptors. E.g if my understanding is correct, for this series 
and for the example above, it still can not work for packed virtqueue 
since descriptor id is not sufficient (descriptor could be overwritten 
by used one). You probably need to have a (partial) copy of descriptor 
ring for this.

- No need to deal with migration, all information was in guest memory.

Thanks

>
> Thanks,
> Yongji
Yongji Xie Dec. 12, 2018, 9:18 a.m. UTC | #18
On Wed, 12 Dec 2018 at 15:47, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2018/12/12 下午2:41, Yongji Xie wrote:
> > On Wed, 12 Dec 2018 at 12:07, Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> On 2018/12/12 上午11:21, Yongji Xie wrote:
> >>> On Wed, 12 Dec 2018 at 11:00, Jason Wang <jasowang@redhat.com> wrote:
> >>>> On 2018/12/12 上午10:48, Yongji Xie wrote:
> >>>>> On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote:
> >>>>>> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote:
> >>>>>>> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote:
> >>>>>>>> On 2018/12/6 下午2:35,elohimes@gmail.com  wrote:
> >>>>>>>>> From: Xie Yongji<xieyongji@baidu.com>
> >>>>>>>>>
> >>>>>>>>> This patchset is aimed at supporting qemu to reconnect
> >>>>>>>>> vhost-user-blk backend after vhost-user-blk backend crash or
> >>>>>>>>> restart.
> >>>>>>>>>
> >>>>>>>>> The patch 1 tries to implenment the sync connection for
> >>>>>>>>> "reconnect socket".
> >>>>>>>>>
> >>>>>>>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> >>>>>>>>> to support offering shared memory to backend to record
> >>>>>>>>> its inflight I/O.
> >>>>>>>>>
> >>>>>>>>> The patch 3,4 are the corresponding libvhost-user patches of
> >>>>>>>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> >>>>>>>>>
> >>>>>>>>> The patch 5 supports vhost-user-blk to reconnect backend when
> >>>>>>>>> connection closed.
> >>>>>>>>>
> >>>>>>>>> The patch 6 tells qemu that we support reconnecting now.
> >>>>>>>>>
> >>>>>>>>> To use it, we could start qemu with:
> >>>>>>>>>
> >>>>>>>>> qemu-system-x86_64 \
> >>>>>>>>>              -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> >>>>>>>>>              -device vhost-user-blk-pci,chardev=char0 \
> >>>>>>>>>
> >>>>>>>>> and start vhost-user-blk backend with:
> >>>>>>>>>
> >>>>>>>>> vhost-user-blk -b /path/file -s /path/vhost.socket
> >>>>>>>>>
> >>>>>>>>> Then we can restart vhost-user-blk at any time during VM running.
> >>>>>>>> I wonder whether or not it's better to handle this at the level of virtio
> >>>>>>>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to
> >>>>>>>> driver might be sufficient?
> >>>>>>>>
> >>>>>>>> Another possible issue is, looks like you need to deal with different kinds
> >>>>>>>> of ring layouts e.g packed virtqueues.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>> I'm not sure I understand your comments here.
> >>>>>>> All these would be guest-visible extensions.
> >>>>>> Looks not, it only introduces a shared memory between qemu and
> >>>>>> vhost-user backend?
> >>>>>>
> >>>>>>
> >>>>>>> Possible for sure but how is this related to
> >>>>>>> a patch supporting transparent reconnects?
> >>>>>> I might miss something. My understanding is that we support transparent
> >>>>>> reconnects, but we can't deduce an accurate last_avail_idx and this is
> >>>>>> what capability this series try to add. To me, this series is functional
> >>>>>> equivalent to expose last_avail_idx (or avail_idx_cons) in available
> >>>>>> ring. So the information is inside guest memory, vhost-user backend can
> >>>>>> access it and update it directly. I believe this is some modern NIC did
> >>>>>> as well (but index is in MMIO area of course).
> >>>>>>
> >>>>> Hi Jason,
> >>>>>
> >>>>> If my understand is correct, it might be not enough to only expose
> >>>>> last_avail_idx.
> >>>>> Because we would not process descriptors in the same order in which they have
> >>>>> been made available sometimes. If so, we can't get correct inflight
> >>>>> I/O information
> >>>>> from available ring.
> >>>> You can get this with the help of the both used ring and last_avail_idx
> >>>> I believe. Or maybe you can give us an example?
> >>>>
> >>> A simple example, we assume ring size is 8:
> >>>
> >>> 1. guest fill avail ring
> >>>
> >>> avail ring: 0 1 2 3 4 5 6 7
> >>> used ring:
> >>>
> >>> 2. vhost-user backend complete 4,5,6,7 and fill used ring
> >>>
> >>> avail ring: 0 1 2 3 4 5 6 7
> >>> used ring: 4 5 6 7
> >>>
> >>> 3. guest fill avail ring again
> >>>
> >>> avail ring: 4 5 6 7 4 5 6 7
> >>> used ring: 4 5 6 7
> >>>
> >>> 4. vhost-user backend crash
> >>>
> >>> The inflight descriptors 0, 1, 2, 3 lost.
> >>>
> >>> Thanks,
> >>> Yongji
> >>
> >> Ok, then we can simply forbid increasing the avail_idx in this case?
> >>
> >> Basically, it's a question of whether or not it's better to done it in
> >> the level of virtio instead of vhost. I'm pretty sure if we expose
> >> sufficient information, it could be done without touching vhost-user.
> >> And we won't deal with e.g migration and other cases.
> >>
> > OK, I get your point. That's indeed an alternative way. But this feature seems
> > to be only useful to vhost-user backend.
>
>
> I admit I could not think of a use case other than vhost-user.
>
>
> >   I'm not sure whether it make sense to
> > touch virtio protocol for this feature.
>
>
> Some possible advantages:
>
> - Feature could be determined and noticed by user or management layer.
>
> - There's no need to invent ring layout specific protocol to record in
> flight descriptors. E.g if my understanding is correct, for this series
> and for the example above, it still can not work for packed virtqueue
> since descriptor id is not sufficient (descriptor could be overwritten
> by used one). You probably need to have a (partial) copy of descriptor
> ring for this.
>
> - No need to deal with migration, all information was in guest memory.
>

Yes, we have those advantages. But seems like handle this in vhost-user
level could be easier to be maintained in production environment. We can
support old guest. And the bug fix will not depend on guest kernel updating.

Thanks,
Yongji
Jason Wang Dec. 13, 2018, 2:58 a.m. UTC | #19
On 2018/12/12 下午5:18, Yongji Xie wrote:
>>>> Ok, then we can simply forbid increasing the avail_idx in this case?
>>>>
>>>> Basically, it's a question of whether or not it's better to done it in
>>>> the level of virtio instead of vhost. I'm pretty sure if we expose
>>>> sufficient information, it could be done without touching vhost-user.
>>>> And we won't deal with e.g migration and other cases.
>>>>
>>> OK, I get your point. That's indeed an alternative way. But this feature seems
>>> to be only useful to vhost-user backend.
>> I admit I could not think of a use case other than vhost-user.
>>
>>
>>>    I'm not sure whether it make sense to
>>> touch virtio protocol for this feature.
>> Some possible advantages:
>>
>> - Feature could be determined and noticed by user or management layer.
>>
>> - There's no need to invent ring layout specific protocol to record in
>> flight descriptors. E.g if my understanding is correct, for this series
>> and for the example above, it still can not work for packed virtqueue
>> since descriptor id is not sufficient (descriptor could be overwritten
>> by used one). You probably need to have a (partial) copy of descriptor
>> ring for this.
>>
>> - No need to deal with migration, all information was in guest memory.
>>
> Yes, we have those advantages. But seems like handle this in vhost-user
> level could be easier to be maintained in production environment. We can
> support old guest. And the bug fix will not depend on guest kernel updating.


Yes. But the my main concern is the layout specific data structure. If 
it could be done through a generic structure (can it?), it would be 
fine. Otherwise, I believe we don't want another negotiation about what 
kind of layout that backend support for reconnect.

Thanks


>
> Thanks,
> Yongji
Yongji Xie Dec. 13, 2018, 3:41 a.m. UTC | #20
On Thu, 13 Dec 2018 at 10:58, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2018/12/12 下午5:18, Yongji Xie wrote:
> >>>> Ok, then we can simply forbid increasing the avail_idx in this case?
> >>>>
> >>>> Basically, it's a question of whether or not it's better to done it in
> >>>> the level of virtio instead of vhost. I'm pretty sure if we expose
> >>>> sufficient information, it could be done without touching vhost-user.
> >>>> And we won't deal with e.g migration and other cases.
> >>>>
> >>> OK, I get your point. That's indeed an alternative way. But this feature seems
> >>> to be only useful to vhost-user backend.
> >> I admit I could not think of a use case other than vhost-user.
> >>
> >>
> >>>    I'm not sure whether it make sense to
> >>> touch virtio protocol for this feature.
> >> Some possible advantages:
> >>
> >> - Feature could be determined and noticed by user or management layer.
> >>
> >> - There's no need to invent ring layout specific protocol to record in
> >> flight descriptors. E.g if my understanding is correct, for this series
> >> and for the example above, it still can not work for packed virtqueue
> >> since descriptor id is not sufficient (descriptor could be overwritten
> >> by used one). You probably need to have a (partial) copy of descriptor
> >> ring for this.
> >>
> >> - No need to deal with migration, all information was in guest memory.
> >>
> > Yes, we have those advantages. But seems like handle this in vhost-user
> > level could be easier to be maintained in production environment. We can
> > support old guest. And the bug fix will not depend on guest kernel updating.
>
>
> Yes. But the my main concern is the layout specific data structure. If
> it could be done through a generic structure (can it?), it would be
> fine. Otherwise, I believe we don't want another negotiation about what
> kind of layout that backend support for reconnect.
>

Yes, the current layout in shared memory didn't support packed virtqueue because
the information of one descriptor in descriptor ring will not be
available once device fetch it.

I also thought about a generic structure before. But I failed... So I
tried another way
to acheive that in this series. In QEMU side, we just provide a shared
memory to backend
and we didn't define anything for this memory. In backend side, they
should know how to
use those memory to record inflight I/O no matter what kind of
virtqueue they used.
Thus,  If we updates virtqueue for new virtio spec in the feature, we
don't need to touch
QEMU and guest. What do you think about it?

Thanks,
Yongji
Michael S. Tsirkin Dec. 13, 2018, 2:45 p.m. UTC | #21
On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohimes@gmail.com wrote:
> From: Xie Yongji <xieyongji@baidu.com>
> 
> This patchset is aimed at supporting qemu to reconnect
> vhost-user-blk backend after vhost-user-blk backend crash or
> restart.
> 
> The patch 1 tries to implenment the sync connection for
> "reconnect socket".
> 
> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> to support offering shared memory to backend to record
> its inflight I/O.
> 
> The patch 3,4 are the corresponding libvhost-user patches of
> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> 
> The patch 5 supports vhost-user-blk to reconnect backend when
> connection closed.
> 
> The patch 6 tells qemu that we support reconnecting now.
> 
> To use it, we could start qemu with:
> 
> qemu-system-x86_64 \
>         -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
>         -device vhost-user-blk-pci,chardev=char0 \
> 
> and start vhost-user-blk backend with:
> 
> vhost-user-blk -b /path/file -s /path/vhost.socket
> 
> Then we can restart vhost-user-blk at any time during VM running.
> 
> Xie Yongji (6):
>   char-socket: Enable "wait" option for client mode
>   vhost-user: Add shared memory to record inflight I/O
>   libvhost-user: Introduce vu_queue_map_desc()
>   libvhost-user: Support recording inflight I/O in shared memory
>   vhost-user-blk: Add support for reconnecting backend
>   contrib/vhost-user-blk: enable inflight I/O recording

What is missing in all this is documentation.
Specifically docs/interop/vhost-user.txt.

At a high level the design is IMO a good one.

However I would prefer reading the protocol first before
the code.

So here's what I managed to figure out, and it matches
how I imagined it would work when I was still
thinking about out of order for net:

- backend allocates memory to keep its stuff around
- sends it to qemu so it can maintain it
- gets it back on reconnect

format and size etc are all up to the backend,
a good implementation would probably implement some
kind of versioning.

Is this what this implements?

>  chardev/char-socket.c                   |   5 +-
>  contrib/libvhost-user/libvhost-user.c   | 215 ++++++++++++++++++++----
>  contrib/libvhost-user/libvhost-user.h   |  19 +++
>  contrib/vhost-user-blk/vhost-user-blk.c |   3 +-
>  hw/block/vhost-user-blk.c               | 169 +++++++++++++++++--
>  hw/virtio/vhost-user.c                  |  69 ++++++++
>  hw/virtio/vhost.c                       |   8 +
>  include/hw/virtio/vhost-backend.h       |   4 +
>  include/hw/virtio/vhost-user-blk.h      |   4 +
>  include/hw/virtio/vhost-user.h          |   8 +
>  10 files changed, 452 insertions(+), 52 deletions(-)
> 
> -- 
> 2.17.1
Michael S. Tsirkin Dec. 13, 2018, 2:56 p.m. UTC | #22
On Thu, Dec 13, 2018 at 11:41:06AM +0800, Yongji Xie wrote:
> On Thu, 13 Dec 2018 at 10:58, Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > On 2018/12/12 下午5:18, Yongji Xie wrote:
> > >>>> Ok, then we can simply forbid increasing the avail_idx in this case?
> > >>>>
> > >>>> Basically, it's a question of whether or not it's better to done it in
> > >>>> the level of virtio instead of vhost. I'm pretty sure if we expose
> > >>>> sufficient information, it could be done without touching vhost-user.
> > >>>> And we won't deal with e.g migration and other cases.
> > >>>>
> > >>> OK, I get your point. That's indeed an alternative way. But this feature seems
> > >>> to be only useful to vhost-user backend.
> > >> I admit I could not think of a use case other than vhost-user.
> > >>
> > >>
> > >>>    I'm not sure whether it make sense to
> > >>> touch virtio protocol for this feature.
> > >> Some possible advantages:
> > >>
> > >> - Feature could be determined and noticed by user or management layer.
> > >>
> > >> - There's no need to invent ring layout specific protocol to record in
> > >> flight descriptors. E.g if my understanding is correct, for this series
> > >> and for the example above, it still can not work for packed virtqueue
> > >> since descriptor id is not sufficient (descriptor could be overwritten
> > >> by used one). You probably need to have a (partial) copy of descriptor
> > >> ring for this.
> > >>
> > >> - No need to deal with migration, all information was in guest memory.
> > >>
> > > Yes, we have those advantages. But seems like handle this in vhost-user
> > > level could be easier to be maintained in production environment. We can
> > > support old guest. And the bug fix will not depend on guest kernel updating.
> >
> >
> > Yes. But the my main concern is the layout specific data structure. If
> > it could be done through a generic structure (can it?), it would be
> > fine. Otherwise, I believe we don't want another negotiation about what
> > kind of layout that backend support for reconnect.
> >
> 
> Yes, the current layout in shared memory didn't support packed virtqueue because
> the information of one descriptor in descriptor ring will not be
> available once device fetch it.
> 
> I also thought about a generic structure before. But I failed... So I
> tried another way
> to acheive that in this series. In QEMU side, we just provide a shared
> memory to backend
> and we didn't define anything for this memory. In backend side, they
> should know how to
> use those memory to record inflight I/O no matter what kind of
> virtqueue they used.
> Thus,  If we updates virtqueue for new virtio spec in the feature, we
> don't need to touch
> QEMU and guest. What do you think about it?
> 
> Thanks,
> Yongji

I think that's a good direction to take, yes.
Backends need to be very careful about the layout,
with versioning etc.
Yongji Xie Dec. 14, 2018, 1:56 a.m. UTC | #23
On Thu, 13 Dec 2018 at 22:45, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohimes@gmail.com wrote:
> > From: Xie Yongji <xieyongji@baidu.com>
> >
> > This patchset is aimed at supporting qemu to reconnect
> > vhost-user-blk backend after vhost-user-blk backend crash or
> > restart.
> >
> > The patch 1 tries to implenment the sync connection for
> > "reconnect socket".
> >
> > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> > to support offering shared memory to backend to record
> > its inflight I/O.
> >
> > The patch 3,4 are the corresponding libvhost-user patches of
> > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> >
> > The patch 5 supports vhost-user-blk to reconnect backend when
> > connection closed.
> >
> > The patch 6 tells qemu that we support reconnecting now.
> >
> > To use it, we could start qemu with:
> >
> > qemu-system-x86_64 \
> >         -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> >         -device vhost-user-blk-pci,chardev=char0 \
> >
> > and start vhost-user-blk backend with:
> >
> > vhost-user-blk -b /path/file -s /path/vhost.socket
> >
> > Then we can restart vhost-user-blk at any time during VM running.
> >
> > Xie Yongji (6):
> >   char-socket: Enable "wait" option for client mode
> >   vhost-user: Add shared memory to record inflight I/O
> >   libvhost-user: Introduce vu_queue_map_desc()
> >   libvhost-user: Support recording inflight I/O in shared memory
> >   vhost-user-blk: Add support for reconnecting backend
> >   contrib/vhost-user-blk: enable inflight I/O recording
>
> What is missing in all this is documentation.
> Specifically docs/interop/vhost-user.txt.
>
> At a high level the design is IMO a good one.
>
> However I would prefer reading the protocol first before
> the code.
>
> So here's what I managed to figure out, and it matches
> how I imagined it would work when I was still
> thinking about out of order for net:
>
> - backend allocates memory to keep its stuff around
> - sends it to qemu so it can maintain it
> - gets it back on reconnect
>
> format and size etc are all up to the backend,
> a good implementation would probably implement some
> kind of versioning.
>
> Is this what this implements?
>

Definitely, yes. And the comments looks good to me. Qemu get size and
version from backend, then allocate memory and send it back with
version. Backend knows how to use the memory according to the version.
If we do that, should we allocate the memory per device rather than
per virtqueue?

Thanks,
Yongji
Michael S. Tsirkin Dec. 14, 2018, 2:20 a.m. UTC | #24
On Fri, Dec 14, 2018 at 09:56:41AM +0800, Yongji Xie wrote:
> On Thu, 13 Dec 2018 at 22:45, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohimes@gmail.com wrote:
> > > From: Xie Yongji <xieyongji@baidu.com>
> > >
> > > This patchset is aimed at supporting qemu to reconnect
> > > vhost-user-blk backend after vhost-user-blk backend crash or
> > > restart.
> > >
> > > The patch 1 tries to implenment the sync connection for
> > > "reconnect socket".
> > >
> > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> > > to support offering shared memory to backend to record
> > > its inflight I/O.
> > >
> > > The patch 3,4 are the corresponding libvhost-user patches of
> > > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> > >
> > > The patch 5 supports vhost-user-blk to reconnect backend when
> > > connection closed.
> > >
> > > The patch 6 tells qemu that we support reconnecting now.
> > >
> > > To use it, we could start qemu with:
> > >
> > > qemu-system-x86_64 \
> > >         -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> > >         -device vhost-user-blk-pci,chardev=char0 \
> > >
> > > and start vhost-user-blk backend with:
> > >
> > > vhost-user-blk -b /path/file -s /path/vhost.socket
> > >
> > > Then we can restart vhost-user-blk at any time during VM running.
> > >
> > > Xie Yongji (6):
> > >   char-socket: Enable "wait" option for client mode
> > >   vhost-user: Add shared memory to record inflight I/O
> > >   libvhost-user: Introduce vu_queue_map_desc()
> > >   libvhost-user: Support recording inflight I/O in shared memory
> > >   vhost-user-blk: Add support for reconnecting backend
> > >   contrib/vhost-user-blk: enable inflight I/O recording
> >
> > What is missing in all this is documentation.
> > Specifically docs/interop/vhost-user.txt.
> >
> > At a high level the design is IMO a good one.
> >
> > However I would prefer reading the protocol first before
> > the code.
> >
> > So here's what I managed to figure out, and it matches
> > how I imagined it would work when I was still
> > thinking about out of order for net:
> >
> > - backend allocates memory to keep its stuff around
> > - sends it to qemu so it can maintain it
> > - gets it back on reconnect
> >
> > format and size etc are all up to the backend,
> > a good implementation would probably implement some
> > kind of versioning.
> >
> > Is this what this implements?
> >
> 
> Definitely, yes. And the comments looks good to me. Qemu get size and
> version from backend, then allocate memory and send it back with
> version. Backend knows how to use the memory according to the version.
> If we do that, should we allocate the memory per device rather than
> per virtqueue?
> 
> Thanks,
> Yongji

It's up to you. Maybe both.
Yongji Xie Dec. 14, 2018, 2:33 a.m. UTC | #25
On Fri, 14 Dec 2018 at 10:20, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Dec 14, 2018 at 09:56:41AM +0800, Yongji Xie wrote:
> > On Thu, 13 Dec 2018 at 22:45, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohimes@gmail.com wrote:
> > > > From: Xie Yongji <xieyongji@baidu.com>
> > > >
> > > > This patchset is aimed at supporting qemu to reconnect
> > > > vhost-user-blk backend after vhost-user-blk backend crash or
> > > > restart.
> > > >
> > > > The patch 1 tries to implenment the sync connection for
> > > > "reconnect socket".
> > > >
> > > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> > > > to support offering shared memory to backend to record
> > > > its inflight I/O.
> > > >
> > > > The patch 3,4 are the corresponding libvhost-user patches of
> > > > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> > > >
> > > > The patch 5 supports vhost-user-blk to reconnect backend when
> > > > connection closed.
> > > >
> > > > The patch 6 tells qemu that we support reconnecting now.
> > > >
> > > > To use it, we could start qemu with:
> > > >
> > > > qemu-system-x86_64 \
> > > >         -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> > > >         -device vhost-user-blk-pci,chardev=char0 \
> > > >
> > > > and start vhost-user-blk backend with:
> > > >
> > > > vhost-user-blk -b /path/file -s /path/vhost.socket
> > > >
> > > > Then we can restart vhost-user-blk at any time during VM running.
> > > >
> > > > Xie Yongji (6):
> > > >   char-socket: Enable "wait" option for client mode
> > > >   vhost-user: Add shared memory to record inflight I/O
> > > >   libvhost-user: Introduce vu_queue_map_desc()
> > > >   libvhost-user: Support recording inflight I/O in shared memory
> > > >   vhost-user-blk: Add support for reconnecting backend
> > > >   contrib/vhost-user-blk: enable inflight I/O recording
> > >
> > > What is missing in all this is documentation.
> > > Specifically docs/interop/vhost-user.txt.
> > >
> > > At a high level the design is IMO a good one.
> > >
> > > However I would prefer reading the protocol first before
> > > the code.
> > >
> > > So here's what I managed to figure out, and it matches
> > > how I imagined it would work when I was still
> > > thinking about out of order for net:
> > >
> > > - backend allocates memory to keep its stuff around
> > > - sends it to qemu so it can maintain it
> > > - gets it back on reconnect
> > >
> > > format and size etc are all up to the backend,
> > > a good implementation would probably implement some
> > > kind of versioning.
> > >
> > > Is this what this implements?
> > >
> >
> > Definitely, yes. And the comments looks good to me. Qemu get size and
> > version from backend, then allocate memory and send it back with
> > version. Backend knows how to use the memory according to the version.
> > If we do that, should we allocate the memory per device rather than
> > per virtqueue?
> >
> > Thanks,
> > Yongji
>
> It's up to you. Maybe both.
>

OK. I think I may still keep it in virtqueue level in v2. Thank you.

Thanks,
Yongji
Jason Wang Dec. 14, 2018, 4:36 a.m. UTC | #26
On 2018/12/13 下午10:56, Michael S. Tsirkin wrote:
> On Thu, Dec 13, 2018 at 11:41:06AM +0800, Yongji Xie wrote:
>> On Thu, 13 Dec 2018 at 10:58, Jason Wang <jasowang@redhat.com> wrote:
>>>
>>> On 2018/12/12 下午5:18, Yongji Xie wrote:
>>>>>>> Ok, then we can simply forbid increasing the avail_idx in this case?
>>>>>>>
>>>>>>> Basically, it's a question of whether or not it's better to done it in
>>>>>>> the level of virtio instead of vhost. I'm pretty sure if we expose
>>>>>>> sufficient information, it could be done without touching vhost-user.
>>>>>>> And we won't deal with e.g migration and other cases.
>>>>>>>
>>>>>> OK, I get your point. That's indeed an alternative way. But this feature seems
>>>>>> to be only useful to vhost-user backend.
>>>>> I admit I could not think of a use case other than vhost-user.
>>>>>
>>>>>
>>>>>>     I'm not sure whether it make sense to
>>>>>> touch virtio protocol for this feature.
>>>>> Some possible advantages:
>>>>>
>>>>> - Feature could be determined and noticed by user or management layer.
>>>>>
>>>>> - There's no need to invent ring layout specific protocol to record in
>>>>> flight descriptors. E.g if my understanding is correct, for this series
>>>>> and for the example above, it still can not work for packed virtqueue
>>>>> since descriptor id is not sufficient (descriptor could be overwritten
>>>>> by used one). You probably need to have a (partial) copy of descriptor
>>>>> ring for this.
>>>>>
>>>>> - No need to deal with migration, all information was in guest memory.
>>>>>
>>>> Yes, we have those advantages. But seems like handle this in vhost-user
>>>> level could be easier to be maintained in production environment. We can
>>>> support old guest. And the bug fix will not depend on guest kernel updating.
>>>
>>> Yes. But the my main concern is the layout specific data structure. If
>>> it could be done through a generic structure (can it?), it would be
>>> fine. Otherwise, I believe we don't want another negotiation about what
>>> kind of layout that backend support for reconnect.
>>>
>> Yes, the current layout in shared memory didn't support packed virtqueue because
>> the information of one descriptor in descriptor ring will not be
>> available once device fetch it.
>>
>> I also thought about a generic structure before. But I failed... So I
>> tried another way
>> to acheive that in this series. In QEMU side, we just provide a shared
>> memory to backend
>> and we didn't define anything for this memory. In backend side, they
>> should know how to
>> use those memory to record inflight I/O no matter what kind of
>> virtqueue they used.
>> Thus,  If we updates virtqueue for new virtio spec in the feature, we
>> don't need to touch
>> QEMU and guest. What do you think about it?
>>
>> Thanks,
>> Yongji
> I think that's a good direction to take, yes.
> Backends need to be very careful about the layout,
> with versioning etc.


I'm not sure this could be done 100% transparent to qemu. E.g you need 
to deal with reset I think and you need to carefully choose the size of 
the region. Which means you need negotiate the size, layout through 
backend. And need to deal with migration with them. This is another sin 
of splitting virtio dataplane from qemu anyway.


Thanks


>
Michael S. Tsirkin Dec. 14, 2018, 1:31 p.m. UTC | #27
On Fri, Dec 14, 2018 at 12:36:01PM +0800, Jason Wang wrote:
> 
> On 2018/12/13 下午10:56, Michael S. Tsirkin wrote:
> > On Thu, Dec 13, 2018 at 11:41:06AM +0800, Yongji Xie wrote:
> > > On Thu, 13 Dec 2018 at 10:58, Jason Wang <jasowang@redhat.com> wrote:
> > > > 
> > > > On 2018/12/12 下午5:18, Yongji Xie wrote:
> > > > > > > > Ok, then we can simply forbid increasing the avail_idx in this case?
> > > > > > > > 
> > > > > > > > Basically, it's a question of whether or not it's better to done it in
> > > > > > > > the level of virtio instead of vhost. I'm pretty sure if we expose
> > > > > > > > sufficient information, it could be done without touching vhost-user.
> > > > > > > > And we won't deal with e.g migration and other cases.
> > > > > > > > 
> > > > > > > OK, I get your point. That's indeed an alternative way. But this feature seems
> > > > > > > to be only useful to vhost-user backend.
> > > > > > I admit I could not think of a use case other than vhost-user.
> > > > > > 
> > > > > > 
> > > > > > >     I'm not sure whether it make sense to
> > > > > > > touch virtio protocol for this feature.
> > > > > > Some possible advantages:
> > > > > > 
> > > > > > - Feature could be determined and noticed by user or management layer.
> > > > > > 
> > > > > > - There's no need to invent ring layout specific protocol to record in
> > > > > > flight descriptors. E.g if my understanding is correct, for this series
> > > > > > and for the example above, it still can not work for packed virtqueue
> > > > > > since descriptor id is not sufficient (descriptor could be overwritten
> > > > > > by used one). You probably need to have a (partial) copy of descriptor
> > > > > > ring for this.
> > > > > > 
> > > > > > - No need to deal with migration, all information was in guest memory.
> > > > > > 
> > > > > Yes, we have those advantages. But seems like handle this in vhost-user
> > > > > level could be easier to be maintained in production environment. We can
> > > > > support old guest. And the bug fix will not depend on guest kernel updating.
> > > > 
> > > > Yes. But the my main concern is the layout specific data structure. If
> > > > it could be done through a generic structure (can it?), it would be
> > > > fine. Otherwise, I believe we don't want another negotiation about what
> > > > kind of layout that backend support for reconnect.
> > > > 
> > > Yes, the current layout in shared memory didn't support packed virtqueue because
> > > the information of one descriptor in descriptor ring will not be
> > > available once device fetch it.
> > > 
> > > I also thought about a generic structure before. But I failed... So I
> > > tried another way
> > > to acheive that in this series. In QEMU side, we just provide a shared
> > > memory to backend
> > > and we didn't define anything for this memory. In backend side, they
> > > should know how to
> > > use those memory to record inflight I/O no matter what kind of
> > > virtqueue they used.
> > > Thus,  If we updates virtqueue for new virtio spec in the feature, we
> > > don't need to touch
> > > QEMU and guest. What do you think about it?
> > > 
> > > Thanks,
> > > Yongji
> > I think that's a good direction to take, yes.
> > Backends need to be very careful about the layout,
> > with versioning etc.
> 
> 
> I'm not sure this could be done 100% transparent to qemu. E.g you need to
> deal with reset I think and you need to carefully choose the size of the
> region. Which means you need negotiate the size, layout through backend.

I am not sure I follow. The point is all this state is internal to the
backend. QEMU does not care at all - it just helps a little by hanging
on to it.

> And
> need to deal with migration with them.

Good catch.
There definitely is an issue in that you can not migrate with backend
being disconnected: migration needs to flush the backend and we can't
when it's disconnected.  This needs to be addressed.
I think it's cleanest to just defer migration
until backend does reconnect.


Backend cross version migration is all messed up in vhost user, I agree.
There was a plan to fix it that was never executed unfortunately.
Maxime, do you still plan to look into it?

> This is another sin of splitting
> virtio dataplane from qemu anyway.
> 
> 
> Thanks

It wasn't split as such - dpdk was never a part of qemu.  We just
enabled it without fuse hacks.
Michael S. Tsirkin Dec. 14, 2018, 9:23 p.m. UTC | #28
On Fri, Dec 14, 2018 at 10:33:54AM +0800, Yongji Xie wrote:
> On Fri, 14 Dec 2018 at 10:20, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Fri, Dec 14, 2018 at 09:56:41AM +0800, Yongji Xie wrote:
> > > On Thu, 13 Dec 2018 at 22:45, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohimes@gmail.com wrote:
> > > > > From: Xie Yongji <xieyongji@baidu.com>
> > > > >
> > > > > This patchset is aimed at supporting qemu to reconnect
> > > > > vhost-user-blk backend after vhost-user-blk backend crash or
> > > > > restart.
> > > > >
> > > > > The patch 1 tries to implenment the sync connection for
> > > > > "reconnect socket".
> > > > >
> > > > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> > > > > to support offering shared memory to backend to record
> > > > > its inflight I/O.
> > > > >
> > > > > The patch 3,4 are the corresponding libvhost-user patches of
> > > > > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> > > > >
> > > > > The patch 5 supports vhost-user-blk to reconnect backend when
> > > > > connection closed.
> > > > >
> > > > > The patch 6 tells qemu that we support reconnecting now.
> > > > >
> > > > > To use it, we could start qemu with:
> > > > >
> > > > > qemu-system-x86_64 \
> > > > >         -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> > > > >         -device vhost-user-blk-pci,chardev=char0 \
> > > > >
> > > > > and start vhost-user-blk backend with:
> > > > >
> > > > > vhost-user-blk -b /path/file -s /path/vhost.socket
> > > > >
> > > > > Then we can restart vhost-user-blk at any time during VM running.
> > > > >
> > > > > Xie Yongji (6):
> > > > >   char-socket: Enable "wait" option for client mode
> > > > >   vhost-user: Add shared memory to record inflight I/O
> > > > >   libvhost-user: Introduce vu_queue_map_desc()
> > > > >   libvhost-user: Support recording inflight I/O in shared memory
> > > > >   vhost-user-blk: Add support for reconnecting backend
> > > > >   contrib/vhost-user-blk: enable inflight I/O recording
> > > >
> > > > What is missing in all this is documentation.
> > > > Specifically docs/interop/vhost-user.txt.
> > > >
> > > > At a high level the design is IMO a good one.
> > > >
> > > > However I would prefer reading the protocol first before
> > > > the code.
> > > >
> > > > So here's what I managed to figure out, and it matches
> > > > how I imagined it would work when I was still
> > > > thinking about out of order for net:
> > > >
> > > > - backend allocates memory to keep its stuff around
> > > > - sends it to qemu so it can maintain it
> > > > - gets it back on reconnect
> > > >
> > > > format and size etc are all up to the backend,
> > > > a good implementation would probably implement some
> > > > kind of versioning.
> > > >
> > > > Is this what this implements?
> > > >
> > >
> > > Definitely, yes. And the comments looks good to me. Qemu get size and
> > > version from backend, then allocate memory and send it back with
> > > version. Backend knows how to use the memory according to the version.
> > > If we do that, should we allocate the memory per device rather than
> > > per virtqueue?
> > >
> > > Thanks,
> > > Yongji
> >
> > It's up to you. Maybe both.
> >
> 
> OK. I think I may still keep it in virtqueue level in v2. Thank you.
> 
> Thanks,
> Yongji

I'd actually add options for both, and backend can set size 0 if it
wants to.
Yongji Xie Dec. 15, 2018, 11:34 a.m. UTC | #29
On Sat, 15 Dec 2018 at 05:23, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Dec 14, 2018 at 10:33:54AM +0800, Yongji Xie wrote:
> > On Fri, 14 Dec 2018 at 10:20, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Fri, Dec 14, 2018 at 09:56:41AM +0800, Yongji Xie wrote:
> > > > On Thu, 13 Dec 2018 at 22:45, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohimes@gmail.com wrote:
> > > > > > From: Xie Yongji <xieyongji@baidu.com>
> > > > > >
> > > > > > This patchset is aimed at supporting qemu to reconnect
> > > > > > vhost-user-blk backend after vhost-user-blk backend crash or
> > > > > > restart.
> > > > > >
> > > > > > The patch 1 tries to implenment the sync connection for
> > > > > > "reconnect socket".
> > > > > >
> > > > > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> > > > > > to support offering shared memory to backend to record
> > > > > > its inflight I/O.
> > > > > >
> > > > > > The patch 3,4 are the corresponding libvhost-user patches of
> > > > > > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> > > > > >
> > > > > > The patch 5 supports vhost-user-blk to reconnect backend when
> > > > > > connection closed.
> > > > > >
> > > > > > The patch 6 tells qemu that we support reconnecting now.
> > > > > >
> > > > > > To use it, we could start qemu with:
> > > > > >
> > > > > > qemu-system-x86_64 \
> > > > > >         -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> > > > > >         -device vhost-user-blk-pci,chardev=char0 \
> > > > > >
> > > > > > and start vhost-user-blk backend with:
> > > > > >
> > > > > > vhost-user-blk -b /path/file -s /path/vhost.socket
> > > > > >
> > > > > > Then we can restart vhost-user-blk at any time during VM running.
> > > > > >
> > > > > > Xie Yongji (6):
> > > > > >   char-socket: Enable "wait" option for client mode
> > > > > >   vhost-user: Add shared memory to record inflight I/O
> > > > > >   libvhost-user: Introduce vu_queue_map_desc()
> > > > > >   libvhost-user: Support recording inflight I/O in shared memory
> > > > > >   vhost-user-blk: Add support for reconnecting backend
> > > > > >   contrib/vhost-user-blk: enable inflight I/O recording
> > > > >
> > > > > What is missing in all this is documentation.
> > > > > Specifically docs/interop/vhost-user.txt.
> > > > >
> > > > > At a high level the design is IMO a good one.
> > > > >
> > > > > However I would prefer reading the protocol first before
> > > > > the code.
> > > > >
> > > > > So here's what I managed to figure out, and it matches
> > > > > how I imagined it would work when I was still
> > > > > thinking about out of order for net:
> > > > >
> > > > > - backend allocates memory to keep its stuff around
> > > > > - sends it to qemu so it can maintain it
> > > > > - gets it back on reconnect
> > > > >
> > > > > format and size etc are all up to the backend,
> > > > > a good implementation would probably implement some
> > > > > kind of versioning.
> > > > >
> > > > > Is this what this implements?
> > > > >
> > > >
> > > > Definitely, yes. And the comments looks good to me. Qemu get size and
> > > > version from backend, then allocate memory and send it back with
> > > > version. Backend knows how to use the memory according to the version.
> > > > If we do that, should we allocate the memory per device rather than
> > > > per virtqueue?
> > > >
> > > > Thanks,
> > > > Yongji
> > >
> > > It's up to you. Maybe both.
> > >
> >
> > OK. I think I may still keep it in virtqueue level in v2. Thank you.
> >
> > Thanks,
> > Yongji
>
> I'd actually add options for both, and backend can set size 0 if it
> wants to.
>
OK, let me try it in v2.

Thanks,
Yongji