[RFC,02/10] vhost: add 3 commands for vhost-vdpa

Message ID	20220105005900.860-3-longpeng2@huawei.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> To: <stefanha@redhat.com>, <mst@redhat.com>, <jasowang@redhat.com>, <sgarzare@redhat.com> CC: <cohuck@redhat.com>, <pbonzini@redhat.com>, <arei.gonglei@huawei.com>, <yechuan@huawei.com>, <huangzhichao@huawei.com>, <qemu-devel@nongnu.org>, Longpeng <longpeng2@huawei.com> Subject: [RFC 02/10] vhost: add 3 commands for vhost-vdpa Date: Wed, 5 Jan 2022 08:58:52 +0800 Message-ID: <20220105005900.860-3-longpeng2@huawei.com> In-Reply-To: <20220105005900.860-1-longpeng2@huawei.com> References: <20220105005900.860-1-longpeng2@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Received-SPF: pass client-ip=45.249.212.188; envelope-from=longpeng2@huawei.com; helo=szxga02-in.huawei.com X-Spam_score_int: -22 X-Spam_score: -2.3 X-Spam_bar: -- X-Spam_report: (-2.3 / 5.0 requ) RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> Reply-to: "Longpeng(Mike)" <longpeng2@huawei.com> From: "Longpeng(Mike)" via <qemu-devel@nongnu.org>
Series	add generic vDPA device support \| expand [RFC,00/10] add generic vDPA device support [RFC,01/10] virtio: get class_id and pci device id by the virtio id [RFC,02/10] vhost: add 3 commands for vhost-vdpa [RFC,03/10] vdpa: add the infrastructure of vdpa-dev [RFC,04/10] vdpa-dev: implement the instance_init/class_init interface [RFC,05/10] vdpa-dev: implement the realize interface [RFC,06/10] vdpa-dev: implement the unrealize interface [RFC,07/10] vdpa-dev: implement the get_config/set_config interface [RFC,08/10] vdpa-dev: implement the get_features interface [RFC,09/10] vdpa-dev: implement the set_status interface [RFC,10/10] vdpa-dev: mark the device as unmigratable

Zhijian Li (Fujitsu)" via Jan. 5, 2022, 12:58 a.m. UTC

From: Longpeng <longpeng2@huawei.com>

To support generic vdpa deivce, we need add the following ioctls:
- GET_VECTORS_NUM: the count of vectors that supported
- GET_CONFIG_SIZE: the size of the virtio config space
- GET_VQS_NUM: the count of virtqueues that exported

Signed-off-by: Longpeng <longpeng2@huawei.com>
---
 linux-headers/linux/vhost.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

Jason Wang Jan. 5, 2022, 4:35 a.m. UTC | #1

On Wed, Jan 5, 2022 at 8:59 AM Longpeng(Mike) <longpeng2@huawei.com> wrote:
>
> From: Longpeng <longpeng2@huawei.com>
>
> To support generic vdpa deivce, we need add the following ioctls:
> - GET_VECTORS_NUM: the count of vectors that supported

Does this mean MSI vectors? If yes, it looks like a layer violation:
vhost is transport independent.  And it reveals device implementation
details which block (cross vendor) migration.

Thanks

> - GET_CONFIG_SIZE: the size of the virtio config space
> - GET_VQS_NUM: the count of virtqueues that exported
>
> Signed-off-by: Longpeng <longpeng2@huawei.com>
> ---
>  linux-headers/linux/vhost.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
> index c998860d7b..c5edd75d15 100644
> --- a/linux-headers/linux/vhost.h
> +++ b/linux-headers/linux/vhost.h
> @@ -150,4 +150,14 @@
>  /* Get the valid iova range */
>  #define VHOST_VDPA_GET_IOVA_RANGE      _IOR(VHOST_VIRTIO, 0x78, \
>                                              struct vhost_vdpa_iova_range)
> +
> +/* Get the number of vectors */
> +#define VHOST_VDPA_GET_VECTORS_NUM     _IOR(VHOST_VIRTIO, 0x79, int)
> +
> +/* Get the virtio config size */
> +#define VHOST_VDPA_GET_CONFIG_SIZE     _IOR(VHOST_VIRTIO, 0x80, int)
> +
> +/* Get the number of virtqueues */
> +#define VHOST_VDPA_GET_VQS_NUM         _IOR(VHOST_VIRTIO, 0x81, int)
> +
>  #endif
> --
> 2.23.0
>

Zhijian Li (Fujitsu)" via Jan. 5, 2022, 6:40 a.m. UTC | #2

> -----Original Message-----
> From: Jason Wang [mailto:jasowang@redhat.com]
> Sent: Wednesday, January 5, 2022 12:36 PM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> <longpeng2@huawei.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>; mst <mst@redhat.com>; Stefano
> Garzarella <sgarzare@redhat.com>; Cornelia Huck <cohuck@redhat.com>; pbonzini
> <pbonzini@redhat.com>; Gonglei (Arei) <arei.gonglei@huawei.com>; Yechuan
> <yechuan@huawei.com>; Huangzhichao <huangzhichao@huawei.com>; qemu-devel
> <qemu-devel@nongnu.org>
> Subject: Re: [RFC 02/10] vhost: add 3 commands for vhost-vdpa
> 
> On Wed, Jan 5, 2022 at 8:59 AM Longpeng(Mike) <longpeng2@huawei.com> wrote:
> >
> > From: Longpeng <longpeng2@huawei.com>
> >
> > To support generic vdpa deivce, we need add the following ioctls:
> > - GET_VECTORS_NUM: the count of vectors that supported
> 
> Does this mean MSI vectors? If yes, it looks like a layer violation:
> vhost is transport independent.  And it reveals device implementation
> details which block (cross vendor) migration.
> 

Can we set the VirtIOPCIProxy.nvectors to "the count of virtqueues + 1 (config)" ?

> Thanks
> 
> > - GET_CONFIG_SIZE: the size of the virtio config space
> > - GET_VQS_NUM: the count of virtqueues that exported
> >
> > Signed-off-by: Longpeng <longpeng2@huawei.com>
> > ---
> >  linux-headers/linux/vhost.h | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
> > index c998860d7b..c5edd75d15 100644
> > --- a/linux-headers/linux/vhost.h
> > +++ b/linux-headers/linux/vhost.h
> > @@ -150,4 +150,14 @@
> >  /* Get the valid iova range */
> >  #define VHOST_VDPA_GET_IOVA_RANGE      _IOR(VHOST_VIRTIO, 0x78, \
> >                                              struct vhost_vdpa_iova_range)
> > +
> > +/* Get the number of vectors */
> > +#define VHOST_VDPA_GET_VECTORS_NUM     _IOR(VHOST_VIRTIO, 0x79, int)
> > +
> > +/* Get the virtio config size */
> > +#define VHOST_VDPA_GET_CONFIG_SIZE     _IOR(VHOST_VIRTIO, 0x80, int)
> > +
> > +/* Get the number of virtqueues */
> > +#define VHOST_VDPA_GET_VQS_NUM         _IOR(VHOST_VIRTIO, 0x81, int)
> > +
> >  #endif
> > --
> > 2.23.0
> >

Jason Wang Jan. 5, 2022, 6:43 a.m. UTC | #3

在 2022/1/5 下午2:40, Longpeng (Mike, Cloud Infrastructure Service Product 
Dept.) 写道:
>
>> -----Original Message-----
>> From: Jason Wang [mailto:jasowang@redhat.com]
>> Sent: Wednesday, January 5, 2022 12:36 PM
>> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
>> <longpeng2@huawei.com>
>> Cc: Stefan Hajnoczi <stefanha@redhat.com>; mst <mst@redhat.com>; Stefano
>> Garzarella <sgarzare@redhat.com>; Cornelia Huck <cohuck@redhat.com>; pbonzini
>> <pbonzini@redhat.com>; Gonglei (Arei) <arei.gonglei@huawei.com>; Yechuan
>> <yechuan@huawei.com>; Huangzhichao <huangzhichao@huawei.com>; qemu-devel
>> <qemu-devel@nongnu.org>
>> Subject: Re: [RFC 02/10] vhost: add 3 commands for vhost-vdpa
>>
>> On Wed, Jan 5, 2022 at 8:59 AM Longpeng(Mike) <longpeng2@huawei.com> wrote:
>>> From: Longpeng <longpeng2@huawei.com>
>>>
>>> To support generic vdpa deivce, we need add the following ioctls:
>>> - GET_VECTORS_NUM: the count of vectors that supported
>> Does this mean MSI vectors? If yes, it looks like a layer violation:
>> vhost is transport independent.  And it reveals device implementation
>> details which block (cross vendor) migration.
>>
> Can we set the VirtIOPCIProxy.nvectors to "the count of virtqueues + 1 (config)" ?


That should work.

Thanks


>
>> Thanks
>>
>>> - GET_CONFIG_SIZE: the size of the virtio config space
>>> - GET_VQS_NUM: the count of virtqueues that exported
>>>
>>> Signed-off-by: Longpeng <longpeng2@huawei.com>
>>> ---
>>>   linux-headers/linux/vhost.h | 10 ++++++++++
>>>   1 file changed, 10 insertions(+)
>>>
>>> diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
>>> index c998860d7b..c5edd75d15 100644
>>> --- a/linux-headers/linux/vhost.h
>>> +++ b/linux-headers/linux/vhost.h
>>> @@ -150,4 +150,14 @@
>>>   /* Get the valid iova range */
>>>   #define VHOST_VDPA_GET_IOVA_RANGE      _IOR(VHOST_VIRTIO, 0x78, \
>>>                                               struct vhost_vdpa_iova_range)
>>> +
>>> +/* Get the number of vectors */
>>> +#define VHOST_VDPA_GET_VECTORS_NUM     _IOR(VHOST_VIRTIO, 0x79, int)
>>> +
>>> +/* Get the virtio config size */
>>> +#define VHOST_VDPA_GET_CONFIG_SIZE     _IOR(VHOST_VIRTIO, 0x80, int)
>>> +
>>> +/* Get the number of virtqueues */
>>> +#define VHOST_VDPA_GET_VQS_NUM         _IOR(VHOST_VIRTIO, 0x81, int)
>>> +
>>>   #endif
>>> --
>>> 2.23.0
>>>

Michael S. Tsirkin Jan. 5, 2022, 7:02 a.m. UTC | #4

On Wed, Jan 05, 2022 at 12:35:53PM +0800, Jason Wang wrote:
> On Wed, Jan 5, 2022 at 8:59 AM Longpeng(Mike) <longpeng2@huawei.com> wrote:
> >
> > From: Longpeng <longpeng2@huawei.com>
> >
> > To support generic vdpa deivce, we need add the following ioctls:
> > - GET_VECTORS_NUM: the count of vectors that supported
> 
> Does this mean MSI vectors? If yes, it looks like a layer violation:
> vhost is transport independent.

Well *guest* needs to know how many vectors device supports.
I don't think there's a way around that. Do you?
Otherwise guests will at best be suboptimal.

>  And it reveals device implementation
> details which block (cross vendor) migration.
> 
> Thanks

Not necessarily, userspace can hide this from guest if it
wants to, just validate.


> > - GET_CONFIG_SIZE: the size of the virtio config space
> > - GET_VQS_NUM: the count of virtqueues that exported
> >
> > Signed-off-by: Longpeng <longpeng2@huawei.com>
> > ---
> >  linux-headers/linux/vhost.h | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
> > index c998860d7b..c5edd75d15 100644
> > --- a/linux-headers/linux/vhost.h
> > +++ b/linux-headers/linux/vhost.h
> > @@ -150,4 +150,14 @@
> >  /* Get the valid iova range */
> >  #define VHOST_VDPA_GET_IOVA_RANGE      _IOR(VHOST_VIRTIO, 0x78, \
> >                                              struct vhost_vdpa_iova_range)
> > +
> > +/* Get the number of vectors */
> > +#define VHOST_VDPA_GET_VECTORS_NUM     _IOR(VHOST_VIRTIO, 0x79, int)
> > +
> > +/* Get the virtio config size */
> > +#define VHOST_VDPA_GET_CONFIG_SIZE     _IOR(VHOST_VIRTIO, 0x80, int)
> > +
> > +/* Get the number of virtqueues */
> > +#define VHOST_VDPA_GET_VQS_NUM         _IOR(VHOST_VIRTIO, 0x81, int)
> > +
> >  #endif
> > --
> > 2.23.0
> >

Jason Wang Jan. 5, 2022, 7:54 a.m. UTC | #5

On Wed, Jan 5, 2022 at 3:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jan 05, 2022 at 12:35:53PM +0800, Jason Wang wrote:
> > On Wed, Jan 5, 2022 at 8:59 AM Longpeng(Mike) <longpeng2@huawei.com> wrote:
> > >
> > > From: Longpeng <longpeng2@huawei.com>
> > >
> > > To support generic vdpa deivce, we need add the following ioctls:
> > > - GET_VECTORS_NUM: the count of vectors that supported
> >
> > Does this mean MSI vectors? If yes, it looks like a layer violation:
> > vhost is transport independent.
>
> Well *guest* needs to know how many vectors device supports.
> I don't think there's a way around that. Do you?

We have VHOST_SET_VRING/CONFIG_CALL which is per vq. I think we can
simply assume #vqs + 1?

> Otherwise guests will at best be suboptimal.
>
> >  And it reveals device implementation
> > details which block (cross vendor) migration.
> >
> > Thanks
>
> Not necessarily, userspace can hide this from guest if it
> wants to, just validate.

If we can hide it at vhost/uAPI level, it would be even better?

Thanks

>
>
> > > - GET_CONFIG_SIZE: the size of the virtio config space
> > > - GET_VQS_NUM: the count of virtqueues that exported
> > >
> > > Signed-off-by: Longpeng <longpeng2@huawei.com>
> > > ---
> > >  linux-headers/linux/vhost.h | 10 ++++++++++
> > >  1 file changed, 10 insertions(+)
> > >
> > > diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
> > > index c998860d7b..c5edd75d15 100644
> > > --- a/linux-headers/linux/vhost.h
> > > +++ b/linux-headers/linux/vhost.h
> > > @@ -150,4 +150,14 @@
> > >  /* Get the valid iova range */
> > >  #define VHOST_VDPA_GET_IOVA_RANGE      _IOR(VHOST_VIRTIO, 0x78, \
> > >                                              struct vhost_vdpa_iova_range)
> > > +
> > > +/* Get the number of vectors */
> > > +#define VHOST_VDPA_GET_VECTORS_NUM     _IOR(VHOST_VIRTIO, 0x79, int)
> > > +
> > > +/* Get the virtio config size */
> > > +#define VHOST_VDPA_GET_CONFIG_SIZE     _IOR(VHOST_VIRTIO, 0x80, int)
> > > +
> > > +/* Get the number of virtqueues */
> > > +#define VHOST_VDPA_GET_VQS_NUM         _IOR(VHOST_VIRTIO, 0x81, int)
> > > +
> > >  #endif
> > > --
> > > 2.23.0
> > >
>

Zhijian Li (Fujitsu)" via Jan. 5, 2022, 8:37 a.m. UTC | #6

> -----Original Message-----
> From: Jason Wang [mailto:jasowang@redhat.com]
> Sent: Wednesday, January 5, 2022 3:54 PM
> To: Michael S. Tsirkin <mst@redhat.com>
> Cc: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> <longpeng2@huawei.com>; Stefan Hajnoczi <stefanha@redhat.com>; Stefano
> Garzarella <sgarzare@redhat.com>; Cornelia Huck <cohuck@redhat.com>; pbonzini
> <pbonzini@redhat.com>; Gonglei (Arei) <arei.gonglei@huawei.com>; Yechuan
> <yechuan@huawei.com>; Huangzhichao <huangzhichao@huawei.com>; qemu-devel
> <qemu-devel@nongnu.org>
> Subject: Re: [RFC 02/10] vhost: add 3 commands for vhost-vdpa
> 
> On Wed, Jan 5, 2022 at 3:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Jan 05, 2022 at 12:35:53PM +0800, Jason Wang wrote:
> > > On Wed, Jan 5, 2022 at 8:59 AM Longpeng(Mike) <longpeng2@huawei.com> wrote:
> > > >
> > > > From: Longpeng <longpeng2@huawei.com>
> > > >
> > > > To support generic vdpa deivce, we need add the following ioctls:
> > > > - GET_VECTORS_NUM: the count of vectors that supported
> > >
> > > Does this mean MSI vectors? If yes, it looks like a layer violation:
> > > vhost is transport independent.
> >
> > Well *guest* needs to know how many vectors device supports.
> > I don't think there's a way around that. Do you?
> 
> We have VHOST_SET_VRING/CONFIG_CALL which is per vq. I think we can
> simply assume #vqs + 1?
> 
> > Otherwise guests will at best be suboptimal.
> >
> > >  And it reveals device implementation
> > > details which block (cross vendor) migration.
> > >
> > > Thanks
> >
> > Not necessarily, userspace can hide this from guest if it
> > wants to, just validate.
> 
> If we can hide it at vhost/uAPI level, it would be even better?
> 

Not only MSI vectors, but also queue-size, #vqs, etc.

Maybe the vhost level could expose the hardware's real capabilities
and let the userspace (QEMU) do the hiding? The userspace know how
to process them.

> Thanks
> 
> >
> >
> > > > - GET_CONFIG_SIZE: the size of the virtio config space
> > > > - GET_VQS_NUM: the count of virtqueues that exported
> > > >
> > > > Signed-off-by: Longpeng <longpeng2@huawei.com>
> > > > ---
> > > >  linux-headers/linux/vhost.h | 10 ++++++++++
> > > >  1 file changed, 10 insertions(+)
> > > >
> > > > diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
> > > > index c998860d7b..c5edd75d15 100644
> > > > --- a/linux-headers/linux/vhost.h
> > > > +++ b/linux-headers/linux/vhost.h
> > > > @@ -150,4 +150,14 @@
> > > >  /* Get the valid iova range */
> > > >  #define VHOST_VDPA_GET_IOVA_RANGE      _IOR(VHOST_VIRTIO, 0x78, \
> > > >                                              struct vhost_vdpa_iova_range)
> > > > +
> > > > +/* Get the number of vectors */
> > > > +#define VHOST_VDPA_GET_VECTORS_NUM     _IOR(VHOST_VIRTIO, 0x79, int)
> > > > +
> > > > +/* Get the virtio config size */
> > > > +#define VHOST_VDPA_GET_CONFIG_SIZE     _IOR(VHOST_VIRTIO, 0x80, int)
> > > > +
> > > > +/* Get the number of virtqueues */
> > > > +#define VHOST_VDPA_GET_VQS_NUM         _IOR(VHOST_VIRTIO, 0x81, int)
> > > > +
> > > >  #endif
> > > > --
> > > > 2.23.0
> > > >
> >

Jason Wang Jan. 5, 2022, 9:09 a.m. UTC | #7

On Wed, Jan 5, 2022 at 4:37 PM Longpeng (Mike, Cloud Infrastructure
Service Product Dept.) <longpeng2@huawei.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jason Wang [mailto:jasowang@redhat.com]
> > Sent: Wednesday, January 5, 2022 3:54 PM
> > To: Michael S. Tsirkin <mst@redhat.com>
> > Cc: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > <longpeng2@huawei.com>; Stefan Hajnoczi <stefanha@redhat.com>; Stefano
> > Garzarella <sgarzare@redhat.com>; Cornelia Huck <cohuck@redhat.com>; pbonzini
> > <pbonzini@redhat.com>; Gonglei (Arei) <arei.gonglei@huawei.com>; Yechuan
> > <yechuan@huawei.com>; Huangzhichao <huangzhichao@huawei.com>; qemu-devel
> > <qemu-devel@nongnu.org>
> > Subject: Re: [RFC 02/10] vhost: add 3 commands for vhost-vdpa
> >
> > On Wed, Jan 5, 2022 at 3:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Jan 05, 2022 at 12:35:53PM +0800, Jason Wang wrote:
> > > > On Wed, Jan 5, 2022 at 8:59 AM Longpeng(Mike) <longpeng2@huawei.com> wrote:
> > > > >
> > > > > From: Longpeng <longpeng2@huawei.com>
> > > > >
> > > > > To support generic vdpa deivce, we need add the following ioctls:
> > > > > - GET_VECTORS_NUM: the count of vectors that supported
> > > >
> > > > Does this mean MSI vectors? If yes, it looks like a layer violation:
> > > > vhost is transport independent.
> > >
> > > Well *guest* needs to know how many vectors device supports.
> > > I don't think there's a way around that. Do you?
> >
> > We have VHOST_SET_VRING/CONFIG_CALL which is per vq. I think we can
> > simply assume #vqs + 1?
> >
> > > Otherwise guests will at best be suboptimal.
> > >
> > > >  And it reveals device implementation
> > > > details which block (cross vendor) migration.
> > > >
> > > > Thanks
> > >
> > > Not necessarily, userspace can hide this from guest if it
> > > wants to, just validate.
> >
> > If we can hide it at vhost/uAPI level, it would be even better?
> >
>
> Not only MSI vectors, but also queue-size, #vqs, etc.

MSI is PCI specific, we have non PCI vDPA parent e.g VDUSE/simulator/mlx5

And it's something that is not guaranteed to be not changed. E.g some
drivers may choose to allocate MSI during set_status() which can fail
for various reasons.

>
> Maybe the vhost level could expose the hardware's real capabilities
> and let the userspace (QEMU) do the hiding? The userspace know how
> to process them.

#MSI vectors is much more easier to be mediated than queue-size and #vqs.

For interrupts, we've already had VHOST_SET_X_KICK, we can keep
allocating eventfd based on #MSI vectors to make it work with any
number of MSI vectors that the virtual device had.

For queue-size, it's Ok to have a new uAPI but it's not a must, Qemu
can simply fail if SET_VRING_NUM fail.

For #vqs, it's OK to have a new uAPI since the emulated virtio-pci
device requires knowledge the #vqs in the config space. (still not a
must, we can enumerate #vqs per device type)

For the config size, it's OK but not a must, technically we can simply
relay what guest write to vhost-vdpa. It's just because current Qemu
require to have it during virtio device initialization.

Thanks

>
> > Thanks
> >
> > >
> > >
> > > > > - GET_CONFIG_SIZE: the size of the virtio config space
> > > > > - GET_VQS_NUM: the count of virtqueues that exported
> > > > >
> > > > > Signed-off-by: Longpeng <longpeng2@huawei.com>
> > > > > ---
> > > > >  linux-headers/linux/vhost.h | 10 ++++++++++
> > > > >  1 file changed, 10 insertions(+)
> > > > >
> > > > > diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
> > > > > index c998860d7b..c5edd75d15 100644
> > > > > --- a/linux-headers/linux/vhost.h
> > > > > +++ b/linux-headers/linux/vhost.h
> > > > > @@ -150,4 +150,14 @@
> > > > >  /* Get the valid iova range */
> > > > >  #define VHOST_VDPA_GET_IOVA_RANGE      _IOR(VHOST_VIRTIO, 0x78, \
> > > > >                                              struct vhost_vdpa_iova_range)
> > > > > +
> > > > > +/* Get the number of vectors */
> > > > > +#define VHOST_VDPA_GET_VECTORS_NUM     _IOR(VHOST_VIRTIO, 0x79, int)
> > > > > +
> > > > > +/* Get the virtio config size */
> > > > > +#define VHOST_VDPA_GET_CONFIG_SIZE     _IOR(VHOST_VIRTIO, 0x80, int)
> > > > > +
> > > > > +/* Get the number of virtqueues */
> > > > > +#define VHOST_VDPA_GET_VQS_NUM         _IOR(VHOST_VIRTIO, 0x81, int)
> > > > > +
> > > > >  #endif
> > > > > --
> > > > > 2.23.0
> > > > >
> > >
>

Michael S. Tsirkin Jan. 5, 2022, 9:12 a.m. UTC | #8

On Wed, Jan 05, 2022 at 03:54:13PM +0800, Jason Wang wrote:
> On Wed, Jan 5, 2022 at 3:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Jan 05, 2022 at 12:35:53PM +0800, Jason Wang wrote:
> > > On Wed, Jan 5, 2022 at 8:59 AM Longpeng(Mike) <longpeng2@huawei.com> wrote:
> > > >
> > > > From: Longpeng <longpeng2@huawei.com>
> > > >
> > > > To support generic vdpa deivce, we need add the following ioctls:
> > > > - GET_VECTORS_NUM: the count of vectors that supported
> > >
> > > Does this mean MSI vectors? If yes, it looks like a layer violation:
> > > vhost is transport independent.
> >
> > Well *guest* needs to know how many vectors device supports.
> > I don't think there's a way around that. Do you?
> 
> We have VHOST_SET_VRING/CONFIG_CALL which is per vq. I think we can
> simply assume #vqs + 1?

Some hardware may be more limited. E.g. there could be a bunch of vqs
which don't need a dedicated interrupt. Or device could support a single
interrupt shared between VQs.


> > Otherwise guests will at best be suboptimal.
> >
> > >  And it reveals device implementation
> > > details which block (cross vendor) migration.
> > >
> > > Thanks
> >
> > Not necessarily, userspace can hide this from guest if it
> > wants to, just validate.
> 
> If we can hide it at vhost/uAPI level, it would be even better?
> 
> Thanks
> 
> >
> >
> > > > - GET_CONFIG_SIZE: the size of the virtio config space
> > > > - GET_VQS_NUM: the count of virtqueues that exported
> > > >
> > > > Signed-off-by: Longpeng <longpeng2@huawei.com>
> > > > ---
> > > >  linux-headers/linux/vhost.h | 10 ++++++++++
> > > >  1 file changed, 10 insertions(+)
> > > >
> > > > diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
> > > > index c998860d7b..c5edd75d15 100644
> > > > --- a/linux-headers/linux/vhost.h
> > > > +++ b/linux-headers/linux/vhost.h
> > > > @@ -150,4 +150,14 @@
> > > >  /* Get the valid iova range */
> > > >  #define VHOST_VDPA_GET_IOVA_RANGE      _IOR(VHOST_VIRTIO, 0x78, \
> > > >                                              struct vhost_vdpa_iova_range)
> > > > +
> > > > +/* Get the number of vectors */
> > > > +#define VHOST_VDPA_GET_VECTORS_NUM     _IOR(VHOST_VIRTIO, 0x79, int)
> > > > +
> > > > +/* Get the virtio config size */
> > > > +#define VHOST_VDPA_GET_CONFIG_SIZE     _IOR(VHOST_VIRTIO, 0x80, int)
> > > > +
> > > > +/* Get the number of virtqueues */
> > > > +#define VHOST_VDPA_GET_VQS_NUM         _IOR(VHOST_VIRTIO, 0x81, int)
> > > > +
> > > >  #endif
> > > > --
> > > > 2.23.0
> > > >
> >

Jason Wang Jan. 5, 2022, 9:21 a.m. UTC | #9

On Wed, Jan 5, 2022 at 5:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jan 05, 2022 at 03:54:13PM +0800, Jason Wang wrote:
> > On Wed, Jan 5, 2022 at 3:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Jan 05, 2022 at 12:35:53PM +0800, Jason Wang wrote:
> > > > On Wed, Jan 5, 2022 at 8:59 AM Longpeng(Mike) <longpeng2@huawei.com> wrote:
> > > > >
> > > > > From: Longpeng <longpeng2@huawei.com>
> > > > >
> > > > > To support generic vdpa deivce, we need add the following ioctls:
> > > > > - GET_VECTORS_NUM: the count of vectors that supported
> > > >
> > > > Does this mean MSI vectors? If yes, it looks like a layer violation:
> > > > vhost is transport independent.
> > >
> > > Well *guest* needs to know how many vectors device supports.
> > > I don't think there's a way around that. Do you?
> >
> > We have VHOST_SET_VRING/CONFIG_CALL which is per vq. I think we can
> > simply assume #vqs + 1?
>
> Some hardware may be more limited. E.g. there could be a bunch of vqs
> which don't need a dedicated interrupt. Or device could support a single
> interrupt shared between VQs.

Right, but in the worst case, we may just waste some eventfds?

Or if we want to be optimized in the case you mentioned above, we need
to know the association among vectors and vqs which requires more
extensions.

1) API to know which vectors does a dedicated VQ belong
2) the kick eventfd is set based on the vectors but not vq

And such mappings are not static, e.g IFCVF requesting MSI-X vectors
during DRIVER_OK.

Thanks

>
>
> > > Otherwise guests will at best be suboptimal.
> > >
> > > >  And it reveals device implementation
> > > > details which block (cross vendor) migration.
> > > >
> > > > Thanks
> > >
> > > Not necessarily, userspace can hide this from guest if it
> > > wants to, just validate.
> >
> > If we can hide it at vhost/uAPI level, it would be even better?
> >
> > Thanks
> >
> > >
> > >
> > > > > - GET_CONFIG_SIZE: the size of the virtio config space
> > > > > - GET_VQS_NUM: the count of virtqueues that exported
> > > > >
> > > > > Signed-off-by: Longpeng <longpeng2@huawei.com>
> > > > > ---
> > > > >  linux-headers/linux/vhost.h | 10 ++++++++++
> > > > >  1 file changed, 10 insertions(+)
> > > > >
> > > > > diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
> > > > > index c998860d7b..c5edd75d15 100644
> > > > > --- a/linux-headers/linux/vhost.h
> > > > > +++ b/linux-headers/linux/vhost.h
> > > > > @@ -150,4 +150,14 @@
> > > > >  /* Get the valid iova range */
> > > > >  #define VHOST_VDPA_GET_IOVA_RANGE      _IOR(VHOST_VIRTIO, 0x78, \
> > > > >                                              struct vhost_vdpa_iova_range)
> > > > > +
> > > > > +/* Get the number of vectors */
> > > > > +#define VHOST_VDPA_GET_VECTORS_NUM     _IOR(VHOST_VIRTIO, 0x79, int)
> > > > > +
> > > > > +/* Get the virtio config size */
> > > > > +#define VHOST_VDPA_GET_CONFIG_SIZE     _IOR(VHOST_VIRTIO, 0x80, int)
> > > > > +
> > > > > +/* Get the number of virtqueues */
> > > > > +#define VHOST_VDPA_GET_VQS_NUM         _IOR(VHOST_VIRTIO, 0x81, int)
> > > > > +
> > > > >  #endif
> > > > > --
> > > > > 2.23.0
> > > > >
> > >
>

Michael S. Tsirkin Jan. 5, 2022, 12:26 p.m. UTC | #10

On Wed, Jan 05, 2022 at 05:09:07PM +0800, Jason Wang wrote:
> On Wed, Jan 5, 2022 at 4:37 PM Longpeng (Mike, Cloud Infrastructure
> Service Product Dept.) <longpeng2@huawei.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Jason Wang [mailto:jasowang@redhat.com]
> > > Sent: Wednesday, January 5, 2022 3:54 PM
> > > To: Michael S. Tsirkin <mst@redhat.com>
> > > Cc: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > > <longpeng2@huawei.com>; Stefan Hajnoczi <stefanha@redhat.com>; Stefano
> > > Garzarella <sgarzare@redhat.com>; Cornelia Huck <cohuck@redhat.com>; pbonzini
> > > <pbonzini@redhat.com>; Gonglei (Arei) <arei.gonglei@huawei.com>; Yechuan
> > > <yechuan@huawei.com>; Huangzhichao <huangzhichao@huawei.com>; qemu-devel
> > > <qemu-devel@nongnu.org>
> > > Subject: Re: [RFC 02/10] vhost: add 3 commands for vhost-vdpa
> > >
> > > On Wed, Jan 5, 2022 at 3:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Wed, Jan 05, 2022 at 12:35:53PM +0800, Jason Wang wrote:
> > > > > On Wed, Jan 5, 2022 at 8:59 AM Longpeng(Mike) <longpeng2@huawei.com> wrote:
> > > > > >
> > > > > > From: Longpeng <longpeng2@huawei.com>
> > > > > >
> > > > > > To support generic vdpa deivce, we need add the following ioctls:
> > > > > > - GET_VECTORS_NUM: the count of vectors that supported
> > > > >
> > > > > Does this mean MSI vectors? If yes, it looks like a layer violation:
> > > > > vhost is transport independent.
> > > >
> > > > Well *guest* needs to know how many vectors device supports.
> > > > I don't think there's a way around that. Do you?
> > >
> > > We have VHOST_SET_VRING/CONFIG_CALL which is per vq. I think we can
> > > simply assume #vqs + 1?
> > >
> > > > Otherwise guests will at best be suboptimal.
> > > >
> > > > >  And it reveals device implementation
> > > > > details which block (cross vendor) migration.
> > > > >
> > > > > Thanks
> > > >
> > > > Not necessarily, userspace can hide this from guest if it
> > > > wants to, just validate.
> > >
> > > If we can hide it at vhost/uAPI level, it would be even better?
> > >
> >
> > Not only MSI vectors, but also queue-size, #vqs, etc.
> 
> MSI is PCI specific, we have non PCI vDPA parent e.g VDUSE/simulator/mlx5
> 
> And it's something that is not guaranteed to be not changed. E.g some
> drivers may choose to allocate MSI during set_status() which can fail
> for various reasons.
> 
> >
> > Maybe the vhost level could expose the hardware's real capabilities
> > and let the userspace (QEMU) do the hiding? The userspace know how
> > to process them.
> 
> #MSI vectors is much more easier to be mediated than queue-size and #vqs.
> 
> For interrupts, we've already had VHOST_SET_X_KICK, we can keep
> allocating eventfd based on #MSI vectors to make it work with any
> number of MSI vectors that the virtual device had.

Right but if hardware does not support so many then what?
Just fail? Having a query API would make things somewhat cleaner imho.

> For queue-size, it's Ok to have a new uAPI but it's not a must, Qemu
> can simply fail if SET_VRING_NUM fail.
>
> For #vqs, it's OK to have a new uAPI since the emulated virtio-pci
> device requires knowledge the #vqs in the config space. (still not a
> must, we can enumerate #vqs per device type)
> 
> For the config size, it's OK but not a must, technically we can simply
> relay what guest write to vhost-vdpa. It's just because current Qemu
> require to have it during virtio device initialization.
> 
> Thanks


I agree but these ok things make for a cleaner API I think.

> >
> > > Thanks
> > >
> > > >
> > > >
> > > > > > - GET_CONFIG_SIZE: the size of the virtio config space
> > > > > > - GET_VQS_NUM: the count of virtqueues that exported
> > > > > >
> > > > > > Signed-off-by: Longpeng <longpeng2@huawei.com>
> > > > > > ---
> > > > > >  linux-headers/linux/vhost.h | 10 ++++++++++
> > > > > >  1 file changed, 10 insertions(+)
> > > > > >
> > > > > > diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
> > > > > > index c998860d7b..c5edd75d15 100644
> > > > > > --- a/linux-headers/linux/vhost.h
> > > > > > +++ b/linux-headers/linux/vhost.h
> > > > > > @@ -150,4 +150,14 @@
> > > > > >  /* Get the valid iova range */
> > > > > >  #define VHOST_VDPA_GET_IOVA_RANGE      _IOR(VHOST_VIRTIO, 0x78, \
> > > > > >                                              struct vhost_vdpa_iova_range)
> > > > > > +
> > > > > > +/* Get the number of vectors */
> > > > > > +#define VHOST_VDPA_GET_VECTORS_NUM     _IOR(VHOST_VIRTIO, 0x79, int)
> > > > > > +
> > > > > > +/* Get the virtio config size */
> > > > > > +#define VHOST_VDPA_GET_CONFIG_SIZE     _IOR(VHOST_VIRTIO, 0x80, int)
> > > > > > +
> > > > > > +/* Get the number of virtqueues */
> > > > > > +#define VHOST_VDPA_GET_VQS_NUM         _IOR(VHOST_VIRTIO, 0x81, int)
> > > > > > +
> > > > > >  #endif
> > > > > > --
> > > > > > 2.23.0
> > > > > >
> > > >
> >

Jason Wang Jan. 6, 2022, 2:34 a.m. UTC | #11

On Wed, Jan 5, 2022 at 8:26 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jan 05, 2022 at 05:09:07PM +0800, Jason Wang wrote:
> > On Wed, Jan 5, 2022 at 4:37 PM Longpeng (Mike, Cloud Infrastructure
> > Service Product Dept.) <longpeng2@huawei.com> wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jason Wang [mailto:jasowang@redhat.com]
> > > > Sent: Wednesday, January 5, 2022 3:54 PM
> > > > To: Michael S. Tsirkin <mst@redhat.com>
> > > > Cc: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > > > <longpeng2@huawei.com>; Stefan Hajnoczi <stefanha@redhat.com>; Stefano
> > > > Garzarella <sgarzare@redhat.com>; Cornelia Huck <cohuck@redhat.com>; pbonzini
> > > > <pbonzini@redhat.com>; Gonglei (Arei) <arei.gonglei@huawei.com>; Yechuan
> > > > <yechuan@huawei.com>; Huangzhichao <huangzhichao@huawei.com>; qemu-devel
> > > > <qemu-devel@nongnu.org>
> > > > Subject: Re: [RFC 02/10] vhost: add 3 commands for vhost-vdpa
> > > >
> > > > On Wed, Jan 5, 2022 at 3:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Wed, Jan 05, 2022 at 12:35:53PM +0800, Jason Wang wrote:
> > > > > > On Wed, Jan 5, 2022 at 8:59 AM Longpeng(Mike) <longpeng2@huawei.com> wrote:
> > > > > > >
> > > > > > > From: Longpeng <longpeng2@huawei.com>
> > > > > > >
> > > > > > > To support generic vdpa deivce, we need add the following ioctls:
> > > > > > > - GET_VECTORS_NUM: the count of vectors that supported
> > > > > >
> > > > > > Does this mean MSI vectors? If yes, it looks like a layer violation:
> > > > > > vhost is transport independent.
> > > > >
> > > > > Well *guest* needs to know how many vectors device supports.
> > > > > I don't think there's a way around that. Do you?
> > > >
> > > > We have VHOST_SET_VRING/CONFIG_CALL which is per vq. I think we can
> > > > simply assume #vqs + 1?
> > > >
> > > > > Otherwise guests will at best be suboptimal.
> > > > >
> > > > > >  And it reveals device implementation
> > > > > > details which block (cross vendor) migration.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Not necessarily, userspace can hide this from guest if it
> > > > > wants to, just validate.
> > > >
> > > > If we can hide it at vhost/uAPI level, it would be even better?
> > > >
> > >
> > > Not only MSI vectors, but also queue-size, #vqs, etc.
> >
> > MSI is PCI specific, we have non PCI vDPA parent e.g VDUSE/simulator/mlx5
> >
> > And it's something that is not guaranteed to be not changed. E.g some
> > drivers may choose to allocate MSI during set_status() which can fail
> > for various reasons.
> >
> > >
> > > Maybe the vhost level could expose the hardware's real capabilities
> > > and let the userspace (QEMU) do the hiding? The userspace know how
> > > to process them.
> >
> > #MSI vectors is much more easier to be mediated than queue-size and #vqs.
> >
> > For interrupts, we've already had VHOST_SET_X_KICK, we can keep
> > allocating eventfd based on #MSI vectors to make it work with any
> > number of MSI vectors that the virtual device had.
>
> Right but if hardware does not support so many then what?
> Just fail?

Or just trigger the callback of vqs that shares the vector.

> Having a query API would make things somewhat cleaner imho.

I may miss something,  even if we know #vectors, we still don't know
the associated virtqueues for a dedicated vector?

>
> > For queue-size, it's Ok to have a new uAPI but it's not a must, Qemu
> > can simply fail if SET_VRING_NUM fail.
> >
> > For #vqs, it's OK to have a new uAPI since the emulated virtio-pci
> > device requires knowledge the #vqs in the config space. (still not a
> > must, we can enumerate #vqs per device type)
> >
> > For the config size, it's OK but not a must, technically we can simply
> > relay what guest write to vhost-vdpa. It's just because current Qemu
> > require to have it during virtio device initialization.
> >
> > Thanks
>
>
> I agree but these ok things make for a cleaner API I think.

Right.

Thanks

>
> > >
> > > > Thanks
> > > >
> > > > >
> > > > >
> > > > > > > - GET_CONFIG_SIZE: the size of the virtio config space
> > > > > > > - GET_VQS_NUM: the count of virtqueues that exported
> > > > > > >
> > > > > > > Signed-off-by: Longpeng <longpeng2@huawei.com>
> > > > > > > ---
> > > > > > >  linux-headers/linux/vhost.h | 10 ++++++++++
> > > > > > >  1 file changed, 10 insertions(+)
> > > > > > >
> > > > > > > diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
> > > > > > > index c998860d7b..c5edd75d15 100644
> > > > > > > --- a/linux-headers/linux/vhost.h
> > > > > > > +++ b/linux-headers/linux/vhost.h
> > > > > > > @@ -150,4 +150,14 @@
> > > > > > >  /* Get the valid iova range */
> > > > > > >  #define VHOST_VDPA_GET_IOVA_RANGE      _IOR(VHOST_VIRTIO, 0x78, \
> > > > > > >                                              struct vhost_vdpa_iova_range)
> > > > > > > +
> > > > > > > +/* Get the number of vectors */
> > > > > > > +#define VHOST_VDPA_GET_VECTORS_NUM     _IOR(VHOST_VIRTIO, 0x79, int)
> > > > > > > +
> > > > > > > +/* Get the virtio config size */
> > > > > > > +#define VHOST_VDPA_GET_CONFIG_SIZE     _IOR(VHOST_VIRTIO, 0x80, int)
> > > > > > > +
> > > > > > > +/* Get the number of virtqueues */
> > > > > > > +#define VHOST_VDPA_GET_VQS_NUM         _IOR(VHOST_VIRTIO, 0x81, int)
> > > > > > > +
> > > > > > >  #endif
> > > > > > > --
> > > > > > > 2.23.0
> > > > > > >
> > > > >
> > >
>

Zhijian Li (Fujitsu)" via Jan. 6, 2022, 8 a.m. UTC | #12

> -----Original Message-----
> From: Jason Wang [mailto:jasowang@redhat.com]
> Sent: Thursday, January 6, 2022 10:34 AM
> To: Michael S. Tsirkin <mst@redhat.com>
> Cc: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> <longpeng2@huawei.com>; Stefan Hajnoczi <stefanha@redhat.com>; Stefano
> Garzarella <sgarzare@redhat.com>; Cornelia Huck <cohuck@redhat.com>; pbonzini
> <pbonzini@redhat.com>; Gonglei (Arei) <arei.gonglei@huawei.com>; Yechuan
> <yechuan@huawei.com>; Huangzhichao <huangzhichao@huawei.com>; qemu-devel
> <qemu-devel@nongnu.org>
> Subject: Re: [RFC 02/10] vhost: add 3 commands for vhost-vdpa
> 
> On Wed, Jan 5, 2022 at 8:26 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Jan 05, 2022 at 05:09:07PM +0800, Jason Wang wrote:
> > > On Wed, Jan 5, 2022 at 4:37 PM Longpeng (Mike, Cloud Infrastructure
> > > Service Product Dept.) <longpeng2@huawei.com> wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Jason Wang [mailto:jasowang@redhat.com]
> > > > > Sent: Wednesday, January 5, 2022 3:54 PM
> > > > > To: Michael S. Tsirkin <mst@redhat.com>
> > > > > Cc: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > > > > <longpeng2@huawei.com>; Stefan Hajnoczi <stefanha@redhat.com>; Stefano
> > > > > Garzarella <sgarzare@redhat.com>; Cornelia Huck <cohuck@redhat.com>;
> pbonzini
> > > > > <pbonzini@redhat.com>; Gonglei (Arei) <arei.gonglei@huawei.com>; Yechuan
> > > > > <yechuan@huawei.com>; Huangzhichao <huangzhichao@huawei.com>;
> qemu-devel
> > > > > <qemu-devel@nongnu.org>
> > > > > Subject: Re: [RFC 02/10] vhost: add 3 commands for vhost-vdpa
> > > > >
> > > > > On Wed, Jan 5, 2022 at 3:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Jan 05, 2022 at 12:35:53PM +0800, Jason Wang wrote:
> > > > > > > On Wed, Jan 5, 2022 at 8:59 AM Longpeng(Mike) <longpeng2@huawei.com>
> wrote:
> > > > > > > >
> > > > > > > > From: Longpeng <longpeng2@huawei.com>
> > > > > > > >
> > > > > > > > To support generic vdpa deivce, we need add the following ioctls:
> > > > > > > > - GET_VECTORS_NUM: the count of vectors that supported
> > > > > > >
> > > > > > > Does this mean MSI vectors? If yes, it looks like a layer violation:
> > > > > > > vhost is transport independent.
> > > > > >
> > > > > > Well *guest* needs to know how many vectors device supports.
> > > > > > I don't think there's a way around that. Do you?
> > > > >
> > > > > We have VHOST_SET_VRING/CONFIG_CALL which is per vq. I think we can
> > > > > simply assume #vqs + 1?
> > > > >
> > > > > > Otherwise guests will at best be suboptimal.
> > > > > >
> > > > > > >  And it reveals device implementation
> > > > > > > details which block (cross vendor) migration.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > Not necessarily, userspace can hide this from guest if it
> > > > > > wants to, just validate.
> > > > >
> > > > > If we can hide it at vhost/uAPI level, it would be even better?
> > > > >
> > > >
> > > > Not only MSI vectors, but also queue-size, #vqs, etc.
> > >
> > > MSI is PCI specific, we have non PCI vDPA parent e.g VDUSE/simulator/mlx5
> > >
> > > And it's something that is not guaranteed to be not changed. E.g some
> > > drivers may choose to allocate MSI during set_status() which can fail
> > > for various reasons.
> > >
> > > >
> > > > Maybe the vhost level could expose the hardware's real capabilities
> > > > and let the userspace (QEMU) do the hiding? The userspace know how
> > > > to process them.
> > >
> > > #MSI vectors is much more easier to be mediated than queue-size and #vqs.
> > >
> > > For interrupts, we've already had VHOST_SET_X_KICK, we can keep
> > > allocating eventfd based on #MSI vectors to make it work with any
> > > number of MSI vectors that the virtual device had.
> >
> > Right but if hardware does not support so many then what?
> > Just fail?
> 
> Or just trigger the callback of vqs that shares the vector.
> 

Then we should disable PI if we need to share a vector in this case?

> > Having a query API would make things somewhat cleaner imho.
> 
> I may miss something,  even if we know #vectors, we still don't know
> the associated virtqueues for a dedicated vector?
> 
> >
> > > For queue-size, it's Ok to have a new uAPI but it's not a must, Qemu
> > > can simply fail if SET_VRING_NUM fail.
> > >
> > > For #vqs, it's OK to have a new uAPI since the emulated virtio-pci
> > > device requires knowledge the #vqs in the config space. (still not a
> > > must, we can enumerate #vqs per device type)
> > >
> > > For the config size, it's OK but not a must, technically we can simply
> > > relay what guest write to vhost-vdpa. It's just because current Qemu
> > > require to have it during virtio device initialization.
> > >
> > > Thanks
> >
> >
> > I agree but these ok things make for a cleaner API I think.
> 
> Right.
> 
> Thanks
> 
> >
> > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > >
> > > > > > > > - GET_CONFIG_SIZE: the size of the virtio config space
> > > > > > > > - GET_VQS_NUM: the count of virtqueues that exported
> > > > > > > >
> > > > > > > > Signed-off-by: Longpeng <longpeng2@huawei.com>
> > > > > > > > ---
> > > > > > > >  linux-headers/linux/vhost.h | 10 ++++++++++
> > > > > > > >  1 file changed, 10 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/linux-headers/linux/vhost.h
> b/linux-headers/linux/vhost.h
> > > > > > > > index c998860d7b..c5edd75d15 100644
> > > > > > > > --- a/linux-headers/linux/vhost.h
> > > > > > > > +++ b/linux-headers/linux/vhost.h
> > > > > > > > @@ -150,4 +150,14 @@
> > > > > > > >  /* Get the valid iova range */
> > > > > > > >  #define VHOST_VDPA_GET_IOVA_RANGE      _IOR(VHOST_VIRTIO, 0x78,
> \
> > > > > > > >                                              struct
> vhost_vdpa_iova_range)
> > > > > > > > +
> > > > > > > > +/* Get the number of vectors */
> > > > > > > > +#define VHOST_VDPA_GET_VECTORS_NUM     _IOR(VHOST_VIRTIO, 0x79,
> int)
> > > > > > > > +
> > > > > > > > +/* Get the virtio config size */
> > > > > > > > +#define VHOST_VDPA_GET_CONFIG_SIZE     _IOR(VHOST_VIRTIO, 0x80,
> int)
> > > > > > > > +
> > > > > > > > +/* Get the number of virtqueues */
> > > > > > > > +#define VHOST_VDPA_GET_VQS_NUM         _IOR(VHOST_VIRTIO, 0x81,
> int)
> > > > > > > > +
> > > > > > > >  #endif
> > > > > > > > --
> > > > > > > > 2.23.0
> > > > > > > >
> > > > > >
> > > >
> >

Michael S. Tsirkin Jan. 6, 2022, 2:09 p.m. UTC | #13

On Thu, Jan 06, 2022 at 10:34:20AM +0800, Jason Wang wrote:
> On Wed, Jan 5, 2022 at 8:26 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Jan 05, 2022 at 05:09:07PM +0800, Jason Wang wrote:
> > > On Wed, Jan 5, 2022 at 4:37 PM Longpeng (Mike, Cloud Infrastructure
> > > Service Product Dept.) <longpeng2@huawei.com> wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Jason Wang [mailto:jasowang@redhat.com]
> > > > > Sent: Wednesday, January 5, 2022 3:54 PM
> > > > > To: Michael S. Tsirkin <mst@redhat.com>
> > > > > Cc: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > > > > <longpeng2@huawei.com>; Stefan Hajnoczi <stefanha@redhat.com>; Stefano
> > > > > Garzarella <sgarzare@redhat.com>; Cornelia Huck <cohuck@redhat.com>; pbonzini
> > > > > <pbonzini@redhat.com>; Gonglei (Arei) <arei.gonglei@huawei.com>; Yechuan
> > > > > <yechuan@huawei.com>; Huangzhichao <huangzhichao@huawei.com>; qemu-devel
> > > > > <qemu-devel@nongnu.org>
> > > > > Subject: Re: [RFC 02/10] vhost: add 3 commands for vhost-vdpa
> > > > >
> > > > > On Wed, Jan 5, 2022 at 3:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Jan 05, 2022 at 12:35:53PM +0800, Jason Wang wrote:
> > > > > > > On Wed, Jan 5, 2022 at 8:59 AM Longpeng(Mike) <longpeng2@huawei.com> wrote:
> > > > > > > >
> > > > > > > > From: Longpeng <longpeng2@huawei.com>
> > > > > > > >
> > > > > > > > To support generic vdpa deivce, we need add the following ioctls:
> > > > > > > > - GET_VECTORS_NUM: the count of vectors that supported
> > > > > > >
> > > > > > > Does this mean MSI vectors? If yes, it looks like a layer violation:
> > > > > > > vhost is transport independent.
> > > > > >
> > > > > > Well *guest* needs to know how many vectors device supports.
> > > > > > I don't think there's a way around that. Do you?
> > > > >
> > > > > We have VHOST_SET_VRING/CONFIG_CALL which is per vq. I think we can
> > > > > simply assume #vqs + 1?
> > > > >
> > > > > > Otherwise guests will at best be suboptimal.
> > > > > >
> > > > > > >  And it reveals device implementation
> > > > > > > details which block (cross vendor) migration.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > Not necessarily, userspace can hide this from guest if it
> > > > > > wants to, just validate.
> > > > >
> > > > > If we can hide it at vhost/uAPI level, it would be even better?
> > > > >
> > > >
> > > > Not only MSI vectors, but also queue-size, #vqs, etc.
> > >
> > > MSI is PCI specific, we have non PCI vDPA parent e.g VDUSE/simulator/mlx5
> > >
> > > And it's something that is not guaranteed to be not changed. E.g some
> > > drivers may choose to allocate MSI during set_status() which can fail
> > > for various reasons.
> > >
> > > >
> > > > Maybe the vhost level could expose the hardware's real capabilities
> > > > and let the userspace (QEMU) do the hiding? The userspace know how
> > > > to process them.
> > >
> > > #MSI vectors is much more easier to be mediated than queue-size and #vqs.
> > >
> > > For interrupts, we've already had VHOST_SET_X_KICK, we can keep
> > > allocating eventfd based on #MSI vectors to make it work with any
> > > number of MSI vectors that the virtual device had.
> >
> > Right but if hardware does not support so many then what?
> > Just fail?
> 
> Or just trigger the callback of vqs that shares the vector.


Right but we want userspace to be able to report this to guest accurately
if it wants to. Guest can then configure itself correctly.


> > Having a query API would make things somewhat cleaner imho.
> 
> I may miss something,  even if we know #vectors, we still don't know
> the associated virtqueues for a dedicated vector?

This is up to the guest.

> >
> > > For queue-size, it's Ok to have a new uAPI but it's not a must, Qemu
> > > can simply fail if SET_VRING_NUM fail.
> > >
> > > For #vqs, it's OK to have a new uAPI since the emulated virtio-pci
> > > device requires knowledge the #vqs in the config space. (still not a
> > > must, we can enumerate #vqs per device type)
> > >
> > > For the config size, it's OK but not a must, technically we can simply
> > > relay what guest write to vhost-vdpa. It's just because current Qemu
> > > require to have it during virtio device initialization.
> > >
> > > Thanks
> >
> >
> > I agree but these ok things make for a cleaner API I think.
> 
> Right.
> 
> Thanks
> 
> >
> > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > >
> > > > > > > > - GET_CONFIG_SIZE: the size of the virtio config space
> > > > > > > > - GET_VQS_NUM: the count of virtqueues that exported
> > > > > > > >
> > > > > > > > Signed-off-by: Longpeng <longpeng2@huawei.com>
> > > > > > > > ---
> > > > > > > >  linux-headers/linux/vhost.h | 10 ++++++++++
> > > > > > > >  1 file changed, 10 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
> > > > > > > > index c998860d7b..c5edd75d15 100644
> > > > > > > > --- a/linux-headers/linux/vhost.h
> > > > > > > > +++ b/linux-headers/linux/vhost.h
> > > > > > > > @@ -150,4 +150,14 @@
> > > > > > > >  /* Get the valid iova range */
> > > > > > > >  #define VHOST_VDPA_GET_IOVA_RANGE      _IOR(VHOST_VIRTIO, 0x78, \
> > > > > > > >                                              struct vhost_vdpa_iova_range)
> > > > > > > > +
> > > > > > > > +/* Get the number of vectors */
> > > > > > > > +#define VHOST_VDPA_GET_VECTORS_NUM     _IOR(VHOST_VIRTIO, 0x79, int)
> > > > > > > > +
> > > > > > > > +/* Get the virtio config size */
> > > > > > > > +#define VHOST_VDPA_GET_CONFIG_SIZE     _IOR(VHOST_VIRTIO, 0x80, int)
> > > > > > > > +
> > > > > > > > +/* Get the number of virtqueues */
> > > > > > > > +#define VHOST_VDPA_GET_VQS_NUM         _IOR(VHOST_VIRTIO, 0x81, int)
> > > > > > > > +
> > > > > > > >  #endif
> > > > > > > > --
> > > > > > > > 2.23.0
> > > > > > > >
> > > > > >
> > > >
> >

Jason Wang Jan. 7, 2022, 2:41 a.m. UTC | #14

在 2022/1/6 下午4:00, Longpeng (Mike, Cloud Infrastructure Service Product 
Dept.) 写道:
>> -----Original Message-----
>> From: Jason Wang [mailto:jasowang@redhat.com]
>> Sent: Thursday, January 6, 2022 10:34 AM
>> To: Michael S. Tsirkin<mst@redhat.com>
>> Cc: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
>> <longpeng2@huawei.com>; Stefan Hajnoczi<stefanha@redhat.com>; Stefano
>> Garzarella<sgarzare@redhat.com>; Cornelia Huck<cohuck@redhat.com>; pbonzini
>> <pbonzini@redhat.com>; Gonglei (Arei)<arei.gonglei@huawei.com>; Yechuan
>> <yechuan@huawei.com>; Huangzhichao<huangzhichao@huawei.com>; qemu-devel
>> <qemu-devel@nongnu.org>
>> Subject: Re: [RFC 02/10] vhost: add 3 commands for vhost-vdpa
>>
>> On Wed, Jan 5, 2022 at 8:26 PM Michael S. Tsirkin<mst@redhat.com>  wrote:
>>> On Wed, Jan 05, 2022 at 05:09:07PM +0800, Jason Wang wrote:
>>>> On Wed, Jan 5, 2022 at 4:37 PM Longpeng (Mike, Cloud Infrastructure
>>>> Service Product Dept.)<longpeng2@huawei.com>  wrote:
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Jason Wang [mailto:jasowang@redhat.com]
>>>>>> Sent: Wednesday, January 5, 2022 3:54 PM
>>>>>> To: Michael S. Tsirkin<mst@redhat.com>
>>>>>> Cc: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
>>>>>> <longpeng2@huawei.com>; Stefan Hajnoczi<stefanha@redhat.com>; Stefano
>>>>>> Garzarella<sgarzare@redhat.com>; Cornelia Huck<cohuck@redhat.com>;
>> pbonzini
>>>>>> <pbonzini@redhat.com>; Gonglei (Arei)<arei.gonglei@huawei.com>; Yechuan
>>>>>> <yechuan@huawei.com>; Huangzhichao<huangzhichao@huawei.com>;
>> qemu-devel
>>>>>> <qemu-devel@nongnu.org>
>>>>>> Subject: Re: [RFC 02/10] vhost: add 3 commands for vhost-vdpa
>>>>>>
>>>>>> On Wed, Jan 5, 2022 at 3:02 PM Michael S. Tsirkin<mst@redhat.com>  wrote:
>>>>>>> On Wed, Jan 05, 2022 at 12:35:53PM +0800, Jason Wang wrote:
>>>>>>>> On Wed, Jan 5, 2022 at 8:59 AM Longpeng(Mike)<longpeng2@huawei.com>
>> wrote:
>>>>>>>>> From: Longpeng<longpeng2@huawei.com>
>>>>>>>>>
>>>>>>>>> To support generic vdpa deivce, we need add the following ioctls:
>>>>>>>>> - GET_VECTORS_NUM: the count of vectors that supported
>>>>>>>> Does this mean MSI vectors? If yes, it looks like a layer violation:
>>>>>>>> vhost is transport independent.
>>>>>>> Well*guest*  needs to know how many vectors device supports.
>>>>>>> I don't think there's a way around that. Do you?
>>>>>> We have VHOST_SET_VRING/CONFIG_CALL which is per vq. I think we can
>>>>>> simply assume #vqs + 1?
>>>>>>
>>>>>>> Otherwise guests will at best be suboptimal.
>>>>>>>
>>>>>>>>   And it reveals device implementation
>>>>>>>> details which block (cross vendor) migration.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>> Not necessarily, userspace can hide this from guest if it
>>>>>>> wants to, just validate.
>>>>>> If we can hide it at vhost/uAPI level, it would be even better?
>>>>>>
>>>>> Not only MSI vectors, but also queue-size, #vqs, etc.
>>>> MSI is PCI specific, we have non PCI vDPA parent e.g VDUSE/simulator/mlx5
>>>>
>>>> And it's something that is not guaranteed to be not changed. E.g some
>>>> drivers may choose to allocate MSI during set_status() which can fail
>>>> for various reasons.
>>>>
>>>>> Maybe the vhost level could expose the hardware's real capabilities
>>>>> and let the userspace (QEMU) do the hiding? The userspace know how
>>>>> to process them.
>>>> #MSI vectors is much more easier to be mediated than queue-size and #vqs.
>>>>
>>>> For interrupts, we've already had VHOST_SET_X_KICK, we can keep
>>>> allocating eventfd based on #MSI vectors to make it work with any
>>>> number of MSI vectors that the virtual device had.
>>> Right but if hardware does not support so many then what?
>>> Just fail?
>> Or just trigger the callback of vqs that shares the vector.
>>
> Then we should disable PI if we need to share a vector in this case?


I may miss something, but I don't see any reason for doing this. I think 
the irqbypass manager and the arch specific PI codes should deal with 
this case.

Ling Shan (cced) told me it works in the past.

Thanks


>

Jason Wang Jan. 7, 2022, 2:53 a.m. UTC | #15

在 2022/1/6 下午10:09, Michael S. Tsirkin 写道:
> On Thu, Jan 06, 2022 at 10:34:20AM +0800, Jason Wang wrote:
>> On Wed, Jan 5, 2022 at 8:26 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>> On Wed, Jan 05, 2022 at 05:09:07PM +0800, Jason Wang wrote:
>>>> On Wed, Jan 5, 2022 at 4:37 PM Longpeng (Mike, Cloud Infrastructure
>>>> Service Product Dept.) <longpeng2@huawei.com> wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Jason Wang [mailto:jasowang@redhat.com]
>>>>>> Sent: Wednesday, January 5, 2022 3:54 PM
>>>>>> To: Michael S. Tsirkin <mst@redhat.com>
>>>>>> Cc: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
>>>>>> <longpeng2@huawei.com>; Stefan Hajnoczi <stefanha@redhat.com>; Stefano
>>>>>> Garzarella <sgarzare@redhat.com>; Cornelia Huck <cohuck@redhat.com>; pbonzini
>>>>>> <pbonzini@redhat.com>; Gonglei (Arei) <arei.gonglei@huawei.com>; Yechuan
>>>>>> <yechuan@huawei.com>; Huangzhichao <huangzhichao@huawei.com>; qemu-devel
>>>>>> <qemu-devel@nongnu.org>
>>>>>> Subject: Re: [RFC 02/10] vhost: add 3 commands for vhost-vdpa
>>>>>>
>>>>>> On Wed, Jan 5, 2022 at 3:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>> On Wed, Jan 05, 2022 at 12:35:53PM +0800, Jason Wang wrote:
>>>>>>>> On Wed, Jan 5, 2022 at 8:59 AM Longpeng(Mike) <longpeng2@huawei.com> wrote:
>>>>>>>>> From: Longpeng <longpeng2@huawei.com>
>>>>>>>>>
>>>>>>>>> To support generic vdpa deivce, we need add the following ioctls:
>>>>>>>>> - GET_VECTORS_NUM: the count of vectors that supported
>>>>>>>> Does this mean MSI vectors? If yes, it looks like a layer violation:
>>>>>>>> vhost is transport independent.
>>>>>>> Well *guest* needs to know how many vectors device supports.
>>>>>>> I don't think there's a way around that. Do you?
>>>>>> We have VHOST_SET_VRING/CONFIG_CALL which is per vq. I think we can
>>>>>> simply assume #vqs + 1?
>>>>>>
>>>>>>> Otherwise guests will at best be suboptimal.
>>>>>>>
>>>>>>>>   And it reveals device implementation
>>>>>>>> details which block (cross vendor) migration.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>> Not necessarily, userspace can hide this from guest if it
>>>>>>> wants to, just validate.
>>>>>> If we can hide it at vhost/uAPI level, it would be even better?
>>>>>>
>>>>> Not only MSI vectors, but also queue-size, #vqs, etc.
>>>> MSI is PCI specific, we have non PCI vDPA parent e.g VDUSE/simulator/mlx5
>>>>
>>>> And it's something that is not guaranteed to be not changed. E.g some
>>>> drivers may choose to allocate MSI during set_status() which can fail
>>>> for various reasons.
>>>>
>>>>> Maybe the vhost level could expose the hardware's real capabilities
>>>>> and let the userspace (QEMU) do the hiding? The userspace know how
>>>>> to process them.
>>>> #MSI vectors is much more easier to be mediated than queue-size and #vqs.
>>>>
>>>> For interrupts, we've already had VHOST_SET_X_KICK, we can keep
>>>> allocating eventfd based on #MSI vectors to make it work with any
>>>> number of MSI vectors that the virtual device had.
>>> Right but if hardware does not support so many then what?
>>> Just fail?
>> Or just trigger the callback of vqs that shares the vector.
>
> Right but we want userspace to be able to report this to guest accurately
> if it wants to. Guest can then configure itself correctly.
>
>
>>> Having a query API would make things somewhat cleaner imho.
>> I may miss something,  even if we know #vectors, we still don't know
>> the associated virtqueues for a dedicated vector?
> This is up to the guest.


Just to clarify the possible issue, this only works if vDPA parent is 
using the same irq binding policy as what viritio-pci did in the guest.

Consider vDPA has 3 vectors allocated:

host vector 0: tx/rx
host vector 1: cvq
host vector 2: config

So we return 3 for get_vectors. So the virtual device will have 3 
vectors in this case.

But a guest driver may do:

guest vector 0: tx (eventfd0)
guest vector 1: rx (eventfd1)
guest vector 2: cvq/config (eventfd2)

The irq handler of host vector0 will notify both eventfd0(guest vector0) 
and eventfd1(guest vector1) in this case.

And using such "vectors passthrough" may block migration between the 
vDPA device where the #vectors is the only difference.

Thanks


>
>>>> For queue-size, it's Ok to have a new uAPI but it's not a must, Qemu
>>>> can simply fail if SET_VRING_NUM fail.
>>>>
>>>> For #vqs, it's OK to have a new uAPI since the emulated virtio-pci
>>>> device requires knowledge the #vqs in the config space. (still not a
>>>> must, we can enumerate #vqs per device type)
>>>>
>>>> For the config size, it's OK but not a must, technically we can simply
>>>> relay what guest write to vhost-vdpa. It's just because current Qemu
>>>> require to have it during virtio device initialization.
>>>>
>>>> Thanks
>>>
>>> I agree but these ok things make for a cleaner API I think.
>> Right.
>>
>> Thanks
>>
>>>>>> Thanks
>>>>>>
>>>>>>>
>>>>>>>>> - GET_CONFIG_SIZE: the size of the virtio config space
>>>>>>>>> - GET_VQS_NUM: the count of virtqueues that exported
>>>>>>>>>
>>>>>>>>> Signed-off-by: Longpeng <longpeng2@huawei.com>
>>>>>>>>> ---
>>>>>>>>>   linux-headers/linux/vhost.h | 10 ++++++++++
>>>>>>>>>   1 file changed, 10 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
>>>>>>>>> index c998860d7b..c5edd75d15 100644
>>>>>>>>> --- a/linux-headers/linux/vhost.h
>>>>>>>>> +++ b/linux-headers/linux/vhost.h
>>>>>>>>> @@ -150,4 +150,14 @@
>>>>>>>>>   /* Get the valid iova range */
>>>>>>>>>   #define VHOST_VDPA_GET_IOVA_RANGE      _IOR(VHOST_VIRTIO, 0x78, \
>>>>>>>>>                                               struct vhost_vdpa_iova_range)
>>>>>>>>> +
>>>>>>>>> +/* Get the number of vectors */
>>>>>>>>> +#define VHOST_VDPA_GET_VECTORS_NUM     _IOR(VHOST_VIRTIO, 0x79, int)
>>>>>>>>> +
>>>>>>>>> +/* Get the virtio config size */
>>>>>>>>> +#define VHOST_VDPA_GET_CONFIG_SIZE     _IOR(VHOST_VIRTIO, 0x80, int)
>>>>>>>>> +
>>>>>>>>> +/* Get the number of virtqueues */
>>>>>>>>> +#define VHOST_VDPA_GET_VQS_NUM         _IOR(VHOST_VIRTIO, 0x81, int)
>>>>>>>>> +
>>>>>>>>>   #endif
>>>>>>>>> --
>>>>>>>>> 2.23.0
>>>>>>>>>

[RFC,02/10] vhost: add 3 commands for vhost-vdpa

Commit Message

Comments

Patch