diff mbox series

[1/1] vhost-vsock: add VIRTIO_F_RING_PACKED to feaure_bits

Message ID 20240429113334.2454197-1-pasic@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series [1/1] vhost-vsock: add VIRTIO_F_RING_PACKED to feaure_bits | expand

Commit Message

Halil Pasic April 29, 2024, 11:33 a.m. UTC
Not having VIRTIO_F_RING_PACKED in feature_bits[] is a problem when the
vhost-vsock device does not offer the feature bit VIRTIO_F_RING_PACKED
but the in QEMU device is configured to try to use the packed layout
(the virtio property "packed" is on).

As of today, the  Linux kernel vhost-vsock device does not support the
packed queue layout (as vhost does not support packed), and does not
offer VIRTIO_F_RING_PACKED. Thus when for example a vhost-vsock-ccw is
used with packed=on, VIRTIO_F_RING_PACKED ends up being negotiated,
despite the fact that the device does not actually support it, and
one gets to keep the pieces.

Fixes: 74b3e46630 ("virtio: add property to enable packed virtqueue")
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
---

This is a minimal fix, that follows the current patterns in the
codebase, and not necessarily the best one.

I don't quite understand why vhost_get_features() works the way
it works. Fortunately it is documented, so let me quote the
documentation.

"""
/**
 * vhost_get_features() - return a sanitised set of feature bits
 * @hdev: common vhost_dev structure
 * @feature_bits: pointer to terminated table of feature bits
 * @features: original feature set
 *
 * This returns a set of features bits that is an intersection of what
 * is supported by the vhost backend (hdev->features), the supported
 * feature_bits and the requested feature set.
 */
uint64_t vhost_get_features(struct vhost_dev *hdev, const int *feature_bits,
                            uint64_t features);
"""

Based on this I would expect the following statement to be true: if a
feature bit is not in feature_bits then the corresponding bit in the
return value is guaranteed to be not set (regardless of the values of
the 3rd arguments and hdev->features).

The implementation however does the following: if the feature bit is not
listed in feature_bits (2nd argument) then the corresponding bit in the
return value is set iff the corresponding bit in the 3rd argument
(features) is set (i.e. it does not matter what hdev->features and thus
the vhost backend says).

The documentation however does kind of state, that feature_bits is
supposed to contain the supported features. And under the assumption
that feature bit not in feature_bits implies that the corresponding bit
must not be set in the 3rd argument (features), then even with the
current implementation we do end up with the intersection of the three
as stated. And then vsock would be at fault for violating that
assumption, and my fix would be the best thing to do -- I guess.

Is the implementation the way it is for a good reason, I can't judge
that with certainty for myself.

But I'm pretty convinced that the current approach is fragile,
especially for the feature bits form the range 24 to 40, as those are
not specific to a device.

BTW vsock also lacks VIRTIO_F_ACCESS_PLATFORM, and VIRTIO_F_RING_RESET
as well while vhost-net has both.

If our design is indeed to make the individual devices responsible for
having a complete list of possible features in feature_bits, then at
least having a common macro for the non-device specific features would
make sense to me.

On the other hand, I'm also very happy to send a patch which changes the
behavior of vhost_get_features(), should the community decide that the
current behavior does not make all that much sense -- I lean towards:
probably it does not make much sense, but things like
VIRTIO_F_ACCESS_PLATFORM, which are mandatory feature bits, need careful
consideration, because there vhost can't do so we just won't offer it
and proceed on our merry way is not the right behavior.

Please comment!

Regards,
Halil
---
 hw/virtio/vhost-vsock-common.c | 1 +
 1 file changed, 1 insertion(+)


base-commit: fd87be1dada5672f877e03c2ca8504458292c479

Comments

Halil Pasic May 7, 2024, 7:26 p.m. UTC | #1
On Mon, 29 Apr 2024 13:33:34 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> Not having VIRTIO_F_RING_PACKED in feature_bits[] is a problem when the
> vhost-vsock device does not offer the feature bit VIRTIO_F_RING_PACKED
> but the in QEMU device is configured to try to use the packed layout
> (the virtio property "packed" is on).

polite ping
Halil Pasic May 15, 2024, 10:41 p.m. UTC | #2
On Tue, 7 May 2024 21:26:30 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> > Not having VIRTIO_F_RING_PACKED in feature_bits[] is a problem when the
> > vhost-vsock device does not offer the feature bit VIRTIO_F_RING_PACKED
> > but the in QEMU device is configured to try to use the packed layout
> > (the virtio property "packed" is on).  
> 
> polite ping

ping
Stefano Garzarella May 16, 2024, 8:39 a.m. UTC | #3
On Mon, Apr 29, 2024 at 01:33:34PM GMT, Halil Pasic wrote:
>Not having VIRTIO_F_RING_PACKED in feature_bits[] is a problem when the
>vhost-vsock device does not offer the feature bit VIRTIO_F_RING_PACKED
>but the in QEMU device is configured to try to use the packed layout
>(the virtio property "packed" is on).
>
>As of today, the  Linux kernel vhost-vsock device does not support the
>packed queue layout (as vhost does not support packed), and does not
>offer VIRTIO_F_RING_PACKED. Thus when for example a vhost-vsock-ccw is
>used with packed=on, VIRTIO_F_RING_PACKED ends up being negotiated,
>despite the fact that the device does not actually support it, and
>one gets to keep the pieces.
>
>Fixes: 74b3e46630 ("virtio: add property to enable packed virtqueue")
>Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
>Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
>---
>
>This is a minimal fix, that follows the current patterns in the
>codebase, and not necessarily the best one.

Yeah, I did something similar with commit 562a7d23bf ("vhost: mask 
VIRTIO_F_RING_RESET for vhost and vhost-user devices") so I think for 
now is the right approach.

I suggest to check also other devices like we did in that commit (e.g.  
hw/scsi/vhost-scsi.c, hw/scsi/vhost-user-scsi.c, etc. )

>
>I don't quite understand why vhost_get_features() works the way
>it works. Fortunately it is documented, so let me quote the
>documentation.
>
>"""
>/**
> * vhost_get_features() - return a sanitised set of feature bits
> * @hdev: common vhost_dev structure
> * @feature_bits: pointer to terminated table of feature bits
> * @features: original feature set
> *
> * This returns a set of features bits that is an intersection of what
> * is supported by the vhost backend (hdev->features), the supported
> * feature_bits and the requested feature set.
> */
>uint64_t vhost_get_features(struct vhost_dev *hdev, const int *feature_bits,
>                            uint64_t features);
>"""
>
>Based on this I would expect the following statement to be true: if a
>feature bit is not in feature_bits then the corresponding bit in the
>return value is guaranteed to be not set (regardless of the values of
>the 3rd arguments and hdev->features).
>
>The implementation however does the following: if the feature bit is not
>listed in feature_bits (2nd argument) then the corresponding bit in the
>return value is set iff the corresponding bit in the 3rd argument
>(features) is set (i.e. it does not matter what hdev->features and thus
>the vhost backend says).
>
>The documentation however does kind of state, that feature_bits is
>supposed to contain the supported features. And under the assumption
>that feature bit not in feature_bits implies that the corresponding bit
>must not be set in the 3rd argument (features), then even with the
>current implementation we do end up with the intersection of the three
>as stated. And then vsock would be at fault for violating that
>assumption, and my fix would be the best thing to do -- I guess.
>
>Is the implementation the way it is for a good reason, I can't judge
>that with certainty for myself.

Yes, I think we should fix the documentation, and after a few years of 
not looking at it I'm confused again about what it does.

But re-reading my commit for VIRTIO_F_RING_RESET, it seems that I had 
interpreted `feature_bits` (2nd argument) as a list of features that 
QEMU doesn't know how to emulate and therefore are required by the 
backend (vhost/vhost-user/vdpa). Because the problem is that `features` 
(3rd argument) is a set of features required by the driver that can be 
provided by both QEMU and the backend.

>
>But I'm pretty convinced that the current approach is fragile,
>especially for the feature bits form the range 24 to 40, as those are
>not specific to a device.
>
>BTW vsock also lacks VIRTIO_F_ACCESS_PLATFORM, and VIRTIO_F_RING_RESET
>as well while vhost-net has both.

VIRTIO_F_RING_RESET is just above the line added by this patch.

>
>If our design is indeed to make the individual devices responsible for
>having a complete list of possible features in feature_bits, then at
>least having a common macro for the non-device specific features would
>make sense to me.

Yeah, I agree on this!

>
>On the other hand, I'm also very happy to send a patch which changes the
>behavior of vhost_get_features(), should the community decide that the
>current behavior does not make all that much sense -- I lean towards:
>probably it does not make much sense, but things like
>VIRTIO_F_ACCESS_PLATFORM, which are mandatory feature bits, need careful
>consideration, because there vhost can't do so we just won't offer it
>and proceed on our merry way is not the right behavior.
>
>Please comment!

Maybe we should discuss it in another thread, but I agree that we should 
fix it in someway. Thank you for raising this discussion!

>
>Regards,
>Halil
>---
> hw/virtio/vhost-vsock-common.c | 1 +
> 1 file changed, 1 insertion(+)

This patch LGTM, but as I mention we should fix other devices as well,
but maybe we can do with the common macro you suggested in another 
patch.

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

>
>diff --git a/hw/virtio/vhost-vsock-common.c b/hw/virtio/vhost-vsock-common.c
>index 12ea87d7a7..fd88df2560 100644
>--- a/hw/virtio/vhost-vsock-common.c
>+++ b/hw/virtio/vhost-vsock-common.c
>@@ -22,6 +22,7 @@
> const int feature_bits[] = {
>     VIRTIO_VSOCK_F_SEQPACKET,
>     VIRTIO_F_RING_RESET,
>+    VIRTIO_F_RING_PACKED,
>     VHOST_INVALID_FEATURE_BIT
> };
>
>
>base-commit: fd87be1dada5672f877e03c2ca8504458292c479
>-- 
>2.40.1
>
>
Halil Pasic May 27, 2024, 11:27 a.m. UTC | #4
On Thu, 16 May 2024 10:39:42 +0200
Stefano Garzarella <sgarzare@redhat.com> wrote:

[..]
> >---
> >
> >This is a minimal fix, that follows the current patterns in the
> >codebase, and not necessarily the best one.  
> 
> Yeah, I did something similar with commit 562a7d23bf ("vhost: mask 
> VIRTIO_F_RING_RESET for vhost and vhost-user devices") so I think for 
> now is the right approach.
> 
> I suggest to check also other devices like we did in that commit (e.g.  
> hw/scsi/vhost-scsi.c, hw/scsi/vhost-user-scsi.c, etc. )

Hi Stefano!

Thank you for chiming in, and sorry for the late response. I was hoping
that Michael is going to chime in and that I can base my reply on his
take. Anyway here I  go.

A very valid observation! I do agree that we need this for
basically every vhost device, and since:
* net/vhost-vdpa.c
* hw/net/vhost_net.c
* hw/virtio/vhost-user-fs.c
already have it, that translates to shotgun it to the rest. Which
isn't nice in my opinion, which is why I am hoping for a discussion
on this topic, and a better solution (even if it turns out to be
something like a common macro).
[..]
> >
> >The documentation however does kind of state, that feature_bits is
> >supposed to contain the supported features. And under the assumption
> >that feature bit not in feature_bits implies that the corresponding bit
> >must not be set in the 3rd argument (features), then even with the
> >current implementation we do end up with the intersection of the three
> >as stated. And then vsock would be at fault for violating that
> >assumption, and my fix would be the best thing to do -- I guess.
> >
> >Is the implementation the way it is for a good reason, I can't judge
> >that with certainty for myself.  
> 
> Yes, I think we should fix the documentation, and after a few years of 
> not looking at it I'm confused again about what it does.
> 

I would prefer to fix the algorithm and make whole thing less fragile.

> But re-reading my commit for VIRTIO_F_RING_RESET, it seems that I had 
> interpreted `feature_bits` (2nd argument) as a list of features that 
> QEMU doesn't know how to emulate and therefore are required by the 
> backend (vhost/vhost-user/vdpa). Because the problem is that `features` 
> (3rd argument) is a set of features required by the driver that can be 
> provided by both QEMU and the backend.

Hm. I would say, this does sound like the sanest explanation, that might
justify the current code, but I will argue that for me, it isn't sane
enough.

Here comes my argument. 

1) The uses is explicitly asking for a vhost device and giving the user
a non vhost device is not an option.

2) The whole purpose of vhost is that at least the data plane is
implemented outside of QEMU (I am maybe a little sloppy here with
dataplane). That means a rather substantial portion of the device
implementation is not in QEMU, while QEMU remains in charge of the
setup.

3) Thus I would argue, that all the "transport feature bits" from 24 to
40 should have a corresponding vhost feature because the vhost part needs
some sort of a support. 

What do we have there in bits from 24 to 40 according to the spec?
* VIRTIO_F_INDIRECT_DESC
* VIRTIO_F_EVENT_IDX
* VIRTIO_F_VERSION_1
* VIRTIO_F_ACCESS_PLATFORM
* VIRTIO_F_RING_PACKED
* VIRTIO_F_IN_ORDER
* VIRTIO_F_ORDER_PLATFORM
* VIRTIO_F_SR_IOV
* VIRTIO_F_NOTIFICATION_DATA
* VIRTIO_F_NOTIF_CONFIG_DATA
* VIRTIO_F_RING_RESET
and for transitional:
* VIRTIO_F_NOTIFY_ON_EMPTY
* VIRTIO_F_ANY_LAYOUT
* UNUSED

I would say, form these only VIRTIO_F_SR_IOV and
VIRTIO_F_NOTIF_CONFIG_DATA look iffy in a sense things may work out
for vhost devices without the vhost part doing something for it. And
even there, I don't think it would hurt to make vhost part of the
negotiation (I don't think those are supported by QEMU at this point).

I would very much prefer having a consolidated and safe handling for
these.

4) I would also argue that a bunch of the device specific feature bits
should have vhost feature bits as well for the same reason:
features are also such that for a vhost device, the vhost part needs
some sort of a support.

Looking through all of these would require a lot of time, so instead
of that, let me use SCSI as an example. The features are:
* VIRTIO_SCSI_F_INOUT
* VIRTIO_SCSI_F_HOTPLUG
* VIRTIO_SCSI_F_CHANGE
* VIRTIO_SCSI_F_T10_PI

The in the Linux kernel we have
        VHOST_SCSI_FEATURES = VHOST_FEATURES | (1ULL << VIRTIO_SCSI_F_HOTPLUG) |
                                               (1ULL << VIRTIO_SCSI_F_T10_PI)
but in QEMU kernel_feature_bits does not have 
VIRTIO_SCSI_F_T10_PI which together does not make much sense to me. And I would
also expect VIRTIO_SCSI_F_INOUT to be a part of the negotiation, because 
to me that the side that is processing the queue is the one that should
care about it, but I don't think it is supported by QEMU at all, and
then not negotiating it is fine. VIRTIO_SCSI_F_HOTPLUG is in QEMU's
kernel_feature_bits and thus negotiated. in  So the only one that can be
done purely in QEMU seems to be VIRTIO_SCSI_F_CHANGE.

And for vsock we have
VIRTIO_VSOCK_F_SEQPACKET and VIRTIO_VSOCK_F_STREAM, with VIRTIO_VSOCK_F_STREAM
being strange in a sense that the spec says "If no feature bit is set,
only stream socket type is supported. If VIRTIO_VSOCK_F_SEQPACKET has
been negotiated, the device MAY act as if VIRTIO_VSOCK_F_STREAM has also
been negotiated."

VIRTIO_VSOCK_F_SEQPACKET is negotiated with vhost.

It seems for whatever reason we just assume that the vhost device
supports VIRTIO_VSOCK_F_STREAM and that is why we don't negotiate it. I
would assume based on what I see that the feature VIRTIO_VSOCK_F_STREAM
was retrofitted, and that is what we ended up with. Thus I guess now
we have an implicit rule that any QEMU compatible vhost-vsock
implementation would have to support that. But in practice we care only
about Linux, and thus it does not matter.

5) Based on the following, I would very much prefer a per device list of
features with the semantic "hey QEMU can do that feature without any
specialized vhost-device support (e.g. like VIRTIO_SCSI_F_CHANGE)" over
the current list with the semantics "these are the features that
need vhost backend support, and anything else will just work out". That
way if an omission is made, we would end up with the usually safer
under-indication  instead of the current over-indication.


@Michael, Jason: Could you guys chime in?

> 
> >
> >But I'm pretty convinced that the current approach is fragile,
> >especially for the feature bits form the range 24 to 40, as those are
> >not specific to a device.
> >
> >BTW vsock also lacks VIRTIO_F_ACCESS_PLATFORM, and VIRTIO_F_RING_RESET
> >as well while vhost-net has both.  
> 
> VIRTIO_F_RING_RESET is just above the line added by this patch.
> 

My bad! :)
> >

> >If our design is indeed to make the individual devices responsible for
> >having a complete list of possible features in feature_bits, then at
> >least having a common macro for the non-device specific features would
> >make sense to me.  
> 
> Yeah, I agree on this!

I guess, "possible features" shifted towards "possible features that QEMU
can not provide without the vhost backend".

But some sort of a common list does seem viable. I would however prefer
a can do without list, or maybe even two ("can do without because I do it
myself and alone" and "can do without, because although I must rely on a
vhost capability to provide that feature, the presence of that capability
is not indicated via the feature bits and based on knowing how things are
I'm assuming the capability is there although not specifically
indicated".

> 
> >
> >On the other hand, I'm also very happy to send a patch which changes the
> >behavior of vhost_get_features(), should the community decide that the
> >current behavior does not make all that much sense -- I lean towards:
> >probably it does not make much sense, but things like
> >VIRTIO_F_ACCESS_PLATFORM, which are mandatory feature bits, need careful
> >consideration, because there vhost can't do so we just won't offer it
> >and proceed on our merry way is not the right behavior.
> >
> >Please comment!  
> 
> Maybe we should discuss it in another thread, but I agree that we should 
> fix it in someway. Thank you for raising this discussion!
> 

Hm, I ended up replying in this thread because we still don't have
closure on this patch. But if you think it makes, feel free to start a
different thread, and please do include me.


> >
> >Regards,
> >Halil
> >---
> > hw/virtio/vhost-vsock-common.c | 1 +
> > 1 file changed, 1 insertion(+)  
> 
> This patch LGTM, but as I mention we should fix other devices as well,
> but maybe we can do with the common macro you suggested in another 
> patch.

I agree, this should be definitely another patch or even series
depending on how the discussion pans out.

Regards,
Halil
Jason Wang May 28, 2024, 3:25 a.m. UTC | #5
On Mon, May 27, 2024 at 7:27 PM Halil Pasic <pasic@linux.ibm.com> wrote:
>
> On Thu, 16 May 2024 10:39:42 +0200
> Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> [..]
> > >---
> > >
> > >This is a minimal fix, that follows the current patterns in the
> > >codebase, and not necessarily the best one.
> >
> > Yeah, I did something similar with commit 562a7d23bf ("vhost: mask
> > VIRTIO_F_RING_RESET for vhost and vhost-user devices") so I think for
> > now is the right approach.
> >
> > I suggest to check also other devices like we did in that commit (e.g.
> > hw/scsi/vhost-scsi.c, hw/scsi/vhost-user-scsi.c, etc. )
>
> Hi Stefano!
>
> Thank you for chiming in, and sorry for the late response. I was hoping
> that Michael is going to chime in and that I can base my reply on his
> take. Anyway here I  go.
>
> A very valid observation! I do agree that we need this for
> basically every vhost device, and since:
> * net/vhost-vdpa.c
> * hw/net/vhost_net.c
> * hw/virtio/vhost-user-fs.c
> already have it, that translates to shotgun it to the rest. Which
> isn't nice in my opinion, which is why I am hoping for a discussion
> on this topic, and a better solution (even if it turns out to be
> something like a common macro).
> [..]
> > >
> > >The documentation however does kind of state, that feature_bits is
> > >supposed to contain the supported features. And under the assumption
> > >that feature bit not in feature_bits implies that the corresponding bit
> > >must not be set in the 3rd argument (features), then even with the
> > >current implementation we do end up with the intersection of the three
> > >as stated. And then vsock would be at fault for violating that
> > >assumption, and my fix would be the best thing to do -- I guess.
> > >
> > >Is the implementation the way it is for a good reason, I can't judge
> > >that with certainty for myself.
> >
> > Yes, I think we should fix the documentation, and after a few years of
> > not looking at it I'm confused again about what it does.
> >
>
> I would prefer to fix the algorithm and make whole thing less fragile.
>
> > But re-reading my commit for VIRTIO_F_RING_RESET, it seems that I had
> > interpreted `feature_bits` (2nd argument) as a list of features that
> > QEMU doesn't know how to emulate and therefore are required by the
> > backend (vhost/vhost-user/vdpa). Because the problem is that `features`
> > (3rd argument) is a set of features required by the driver that can be
> > provided by both QEMU and the backend.
>
> Hm. I would say, this does sound like the sanest explanation, that might
> justify the current code, but I will argue that for me, it isn't sane
> enough.
>
> Here comes my argument.
>
> 1) The uses is explicitly asking for a vhost device and giving the user
> a non vhost device is not an option.
>
> 2) The whole purpose of vhost is that at least the data plane is
> implemented outside of QEMU (I am maybe a little sloppy here with
> dataplane). That means a rather substantial portion of the device
> implementation is not in QEMU, while QEMU remains in charge of the
> setup.
>
> 3) Thus I would argue, that all the "transport feature bits" from 24 to
> 40 should have a corresponding vhost feature because the vhost part needs
> some sort of a support.
>
> What do we have there in bits from 24 to 40 according to the spec?
> * VIRTIO_F_INDIRECT_DESC
> * VIRTIO_F_EVENT_IDX
> * VIRTIO_F_VERSION_1
> * VIRTIO_F_ACCESS_PLATFORM
> * VIRTIO_F_RING_PACKED
> * VIRTIO_F_IN_ORDER
> * VIRTIO_F_ORDER_PLATFORM
> * VIRTIO_F_SR_IOV
> * VIRTIO_F_NOTIFICATION_DATA
> * VIRTIO_F_NOTIF_CONFIG_DATA
> * VIRTIO_F_RING_RESET
> and for transitional:
> * VIRTIO_F_NOTIFY_ON_EMPTY
> * VIRTIO_F_ANY_LAYOUT
> * UNUSED
>
> I would say, form these only VIRTIO_F_SR_IOV and
> VIRTIO_F_NOTIF_CONFIG_DATA look iffy in a sense things may work out
> for vhost devices without the vhost part doing something for it. And
> even there, I don't think it would hurt to make vhost part of the
> negotiation (I don't think those are supported by QEMU at this point).
>
> I would very much prefer having a consolidated and safe handling for
> these.
>
> 4) I would also argue that a bunch of the device specific feature bits
> should have vhost feature bits as well for the same reason:
> features are also such that for a vhost device, the vhost part needs
> some sort of a support.
>
> Looking through all of these would require a lot of time, so instead
> of that, let me use SCSI as an example. The features are:
> * VIRTIO_SCSI_F_INOUT
> * VIRTIO_SCSI_F_HOTPLUG
> * VIRTIO_SCSI_F_CHANGE
> * VIRTIO_SCSI_F_T10_PI
>
> The in the Linux kernel we have
>         VHOST_SCSI_FEATURES = VHOST_FEATURES | (1ULL << VIRTIO_SCSI_F_HOTPLUG) |
>                                                (1ULL << VIRTIO_SCSI_F_T10_PI)
> but in QEMU kernel_feature_bits does not have
> VIRTIO_SCSI_F_T10_PI which together does not make much sense to me. And I would
> also expect VIRTIO_SCSI_F_INOUT to be a part of the negotiation, because
> to me that the side that is processing the queue is the one that should
> care about it, but I don't think it is supported by QEMU at all, and
> then not negotiating it is fine. VIRTIO_SCSI_F_HOTPLUG is in QEMU's
> kernel_feature_bits and thus negotiated. in  So the only one that can be
> done purely in QEMU seems to be VIRTIO_SCSI_F_CHANGE.
>
> And for vsock we have
> VIRTIO_VSOCK_F_SEQPACKET and VIRTIO_VSOCK_F_STREAM, with VIRTIO_VSOCK_F_STREAM
> being strange in a sense that the spec says "If no feature bit is set,
> only stream socket type is supported. If VIRTIO_VSOCK_F_SEQPACKET has
> been negotiated, the device MAY act as if VIRTIO_VSOCK_F_STREAM has also
> been negotiated."
>
> VIRTIO_VSOCK_F_SEQPACKET is negotiated with vhost.
>
> It seems for whatever reason we just assume that the vhost device
> supports VIRTIO_VSOCK_F_STREAM and that is why we don't negotiate it. I
> would assume based on what I see that the feature VIRTIO_VSOCK_F_STREAM
> was retrofitted, and that is what we ended up with. Thus I guess now
> we have an implicit rule that any QEMU compatible vhost-vsock
> implementation would have to support that. But in practice we care only
> about Linux, and thus it does not matter.
>
> 5) Based on the following, I would very much prefer a per device list of
> features with the semantic "hey QEMU can do that feature without any
> specialized vhost-device support (e.g. like VIRTIO_SCSI_F_CHANGE)"

Indeed the current code is kind of tricky and may need better
documentation. But the problem is some features were datapath related
and they are supported since the birth of a specific vhost device. For
example, some GSO related features (actually, it's not a feature of
vhost-net but TUN/TAP).

And I've found another interesting thing, for RING_REST, actually we
don't need to do anything but we have the following commits:

313389be06ff6 ("vhost-net: support VIRTIO_F_RING_RESET")
2a3552baafb ("vhost: vhost-kernel: enable vq reset feature")

Technically, they are not necessary as RING_RESET for vhost-kernel
doesn't require any additional new ioctls. But it's too late to change
as the kernel commit has been merged.

> over
> the current list with the semantics "these are the features that
> need vhost backend support, and anything else will just work out". That
> way if an omission is made, we would end up with the usually safer
> under-indication  instead of the current over-indication.
>
>
> @Michael, Jason: Could you guys chime in?

Another issue is that it seems to require a change of the semantic of
VHOST_GET_FEATURES. If my understanding is true, it seems a
non-trivial change which I'm not sure it's worth to bother.

Thanks
Stefano Garzarella May 28, 2024, 3:32 p.m. UTC | #6
On Mon, May 27, 2024 at 01:27:10PM GMT, Halil Pasic wrote:
>On Thu, 16 May 2024 10:39:42 +0200
>Stefano Garzarella <sgarzare@redhat.com> wrote:
>
>[..]
>> >---
>> >
>> >This is a minimal fix, that follows the current patterns in the
>> >codebase, and not necessarily the best one.
>>
>> Yeah, I did something similar with commit 562a7d23bf ("vhost: mask
>> VIRTIO_F_RING_RESET for vhost and vhost-user devices") so I think for
>> now is the right approach.
>>
>> I suggest to check also other devices like we did in that commit (e.g.
>> hw/scsi/vhost-scsi.c, hw/scsi/vhost-user-scsi.c, etc. )
>
>Hi Stefano!
>
>Thank you for chiming in, and sorry for the late response. I was hoping
>that Michael is going to chime in and that I can base my reply on his
>take. Anyway here I  go.
>
>A very valid observation! I do agree that we need this for
>basically every vhost device, and since:
>* net/vhost-vdpa.c
>* hw/net/vhost_net.c
>* hw/virtio/vhost-user-fs.c
>already have it, that translates to shotgun it to the rest. Which
>isn't nice in my opinion, which is why I am hoping for a discussion
>on this topic, and a better solution (even if it turns out to be
>something like a common macro).

Yeah, I see your point and I agree on a better solution.

>[..]
>> >
>> >The documentation however does kind of state, that feature_bits is
>> >supposed to contain the supported features. And under the assumption
>> >that feature bit not in feature_bits implies that the corresponding bit
>> >must not be set in the 3rd argument (features), then even with the
>> >current implementation we do end up with the intersection of the three
>> >as stated. And then vsock would be at fault for violating that
>> >assumption, and my fix would be the best thing to do -- I guess.
>> >
>> >Is the implementation the way it is for a good reason, I can't judge
>> >that with certainty for myself.
>>
>> Yes, I think we should fix the documentation, and after a few years of
>> not looking at it I'm confused again about what it does.
>>
>
>I would prefer to fix the algorithm and make whole thing less fragile.
>
>> But re-reading my commit for VIRTIO_F_RING_RESET, it seems that I had
>> interpreted `feature_bits` (2nd argument) as a list of features that
>> QEMU doesn't know how to emulate and therefore are required by the
>> backend (vhost/vhost-user/vdpa). Because the problem is that `features`
>> (3rd argument) is a set of features required by the driver that can be
>> provided by both QEMU and the backend.
>
>Hm. I would say, this does sound like the sanest explanation, that might
>justify the current code, but I will argue that for me, it isn't sane
>enough.
>
>Here comes my argument.
>
>1) The uses is explicitly asking for a vhost device and giving the user
>a non vhost device is not an option.

I didn't get this point :-( can you elaborate?

>
>2) The whole purpose of vhost is that at least the data plane is
>implemented outside of QEMU (I am maybe a little sloppy here with
>dataplane). That means a rather substantial portion of the device
>implementation is not in QEMU, while QEMU remains in charge of the
>setup.

Yep

>
>3) Thus I would argue, that all the "transport feature bits" from 24 to
>40 should have a corresponding vhost feature because the vhost part needs
>some sort of a support.
>
>What do we have there in bits from 24 to 40 according to the spec?
>* VIRTIO_F_INDIRECT_DESC
>* VIRTIO_F_EVENT_IDX
>* VIRTIO_F_VERSION_1
>* VIRTIO_F_ACCESS_PLATFORM
>* VIRTIO_F_RING_PACKED
>* VIRTIO_F_IN_ORDER
>* VIRTIO_F_ORDER_PLATFORM
>* VIRTIO_F_SR_IOV
>* VIRTIO_F_NOTIFICATION_DATA
>* VIRTIO_F_NOTIF_CONFIG_DATA
>* VIRTIO_F_RING_RESET
>and for transitional:
>* VIRTIO_F_NOTIFY_ON_EMPTY
>* VIRTIO_F_ANY_LAYOUT
>* UNUSED
>
>I would say, form these only VIRTIO_F_SR_IOV and
>VIRTIO_F_NOTIF_CONFIG_DATA look iffy in a sense things may work out
>for vhost devices without the vhost part doing something for it. And
>even there, I don't think it would hurt to make vhost part of the
>negotiation (I don't think those are supported by QEMU at this point).
>
>I would very much prefer having a consolidated and safe handling for
>these.

I completely agree on this!

>
>4) I would also argue that a bunch of the device specific feature bits
>should have vhost feature bits as well for the same reason:
>features are also such that for a vhost device, the vhost part needs
>some sort of a support.
>
>Looking through all of these would require a lot of time, so instead
>of that, let me use SCSI as an example. The features are:
>* VIRTIO_SCSI_F_INOUT
>* VIRTIO_SCSI_F_HOTPLUG
>* VIRTIO_SCSI_F_CHANGE
>* VIRTIO_SCSI_F_T10_PI
>
>The in the Linux kernel we have
>        VHOST_SCSI_FEATURES = VHOST_FEATURES | (1ULL << VIRTIO_SCSI_F_HOTPLUG) |
>                                               (1ULL << VIRTIO_SCSI_F_T10_PI)
>but in QEMU kernel_feature_bits does not have
>VIRTIO_SCSI_F_T10_PI which together does not make much sense to me. And I would
>also expect VIRTIO_SCSI_F_INOUT to be a part of the negotiation, because
>to me that the side that is processing the queue is the one that should
>care about it, but I don't think it is supported by QEMU at all, and
>then not negotiating it is fine. VIRTIO_SCSI_F_HOTPLUG is in QEMU's
>kernel_feature_bits and thus negotiated. in  So the only one that can be
>done purely in QEMU seems to be VIRTIO_SCSI_F_CHANGE.
>
>And for vsock we have
>VIRTIO_VSOCK_F_SEQPACKET and VIRTIO_VSOCK_F_STREAM, with VIRTIO_VSOCK_F_STREAM
>being strange in a sense that the spec says "If no feature bit is set,
>only stream socket type is supported. If VIRTIO_VSOCK_F_SEQPACKET has
>been negotiated, the device MAY act as if VIRTIO_VSOCK_F_STREAM has also
>been negotiated."

Yes, this was a problem we introduced with F_SEQPACKET. Unfortunately, 
when we added that feature, we didn't define F_STREAM well because the 
devices always supported it. So we had to resort to this workaround.

>
>VIRTIO_VSOCK_F_SEQPACKET is negotiated with vhost.
>
>It seems for whatever reason we just assume that the vhost device
>supports VIRTIO_VSOCK_F_STREAM and that is why we don't negotiate it. I
>would assume based on what I see that the feature VIRTIO_VSOCK_F_STREAM
>was retrofitted, and that is what we ended up with. Thus I guess now
>we have an implicit rule that any QEMU compatible vhost-vsock
>implementation would have to support that. But in practice we care only
>about Linux, and thus it does not matter.

Yep, is this, but I have in my todo to fix this somehow.

>
>5) Based on the following, I would very much prefer a per device list of
>features with the semantic "hey QEMU can do that feature without any
>specialized vhost-device support (e.g. like VIRTIO_SCSI_F_CHANGE)" over
>the current list with the semantics "these are the features that
>need vhost backend support, and anything else will just work out". That
>way if an omission is made, we would end up with the usually safer
>under-indication  instead of the current over-indication.

Makes sense to me!

>
>
>@Michael, Jason: Could you guys chime in?
>
>>
>> >
>> >But I'm pretty convinced that the current approach is fragile,
>> >especially for the feature bits form the range 24 to 40, as those are
>> >not specific to a device.
>> >
>> >BTW vsock also lacks VIRTIO_F_ACCESS_PLATFORM, and VIRTIO_F_RING_RESET
>> >as well while vhost-net has both.
>>
>> VIRTIO_F_RING_RESET is just above the line added by this patch.
>>
>
>My bad! :)
>> >
>
>> >If our design is indeed to make the individual devices responsible for
>> >having a complete list of possible features in feature_bits, then at
>> >least having a common macro for the non-device specific features would
>> >make sense to me.
>>
>> Yeah, I agree on this!
>
>I guess, "possible features" shifted towards "possible features that QEMU
>can not provide without the vhost backend".
>
>But some sort of a common list does seem viable. I would however prefer
>a can do without list, or maybe even two ("can do without because I do it
>myself and alone" and "can do without, because although I must rely on a
>vhost capability to provide that feature, the presence of that capability
>is not indicated via the feature bits and based on knowing how things are
>I'm assuming the capability is there although not specifically
>indicated".
>
>>
>> >
>> >On the other hand, I'm also very happy to send a patch which changes the
>> >behavior of vhost_get_features(), should the community decide that the
>> >current behavior does not make all that much sense -- I lean towards:
>> >probably it does not make much sense, but things like
>> >VIRTIO_F_ACCESS_PLATFORM, which are mandatory feature bits, need careful
>> >consideration, because there vhost can't do so we just won't offer it
>> >and proceed on our merry way is not the right behavior.
>> >
>> >Please comment!
>>
>> Maybe we should discuss it in another thread, but I agree that we should
>> fix it in someway. Thank you for raising this discussion!
>>
>
>Hm, I ended up replying in this thread because we still don't have
>closure on this patch. But if you think it makes, feel free to start a
>different thread, and please do include me.

I see, this thread is fine!

>
>
>> >
>> >Regards,
>> >Halil
>> >---
>> > hw/virtio/vhost-vsock-common.c | 1 +
>> > 1 file changed, 1 insertion(+)
>>
>> This patch LGTM, but as I mention we should fix other devices as well,
>> but maybe we can do with the common macro you suggested in another
>> patch.
>
>I agree, this should be definitely another patch or even series
>depending on how the discussion pans out.

Ack!

Thanks,
Stefano
Halil Pasic May 29, 2024, 12:17 p.m. UTC | #7
On Tue, 28 May 2024 11:25:51 +0800
Jason Wang <jasowang@redhat.com> wrote:

> > 5) Based on the following, I would very much prefer a per device list of
> > features with the semantic "hey QEMU can do that feature without any
> > specialized vhost-device support (e.g. like VIRTIO_SCSI_F_CHANGE)"  
> 
> Indeed the current code is kind of tricky and may need better
> documentation. But the problem is some features were datapath related
> and they are supported since the birth of a specific vhost device. For
> example, some GSO related features (actually, it's not a feature of
> vhost-net but TUN/TAP).
> 
> And I've found another interesting thing, for RING_REST, actually we
> don't need to do anything but we have the following commits:
> 
> 313389be06ff6 ("vhost-net: support VIRTIO_F_RING_RESET")
> 2a3552baafb ("vhost: vhost-kernel: enable vq reset feature")
> 
> Technically, they are not necessary as RING_RESET for vhost-kernel
> doesn't require any additional new ioctls. But it's too late to change
> as the kernel commit has been merged.
> 
> > over
> > the current list with the semantics "these are the features that
> > need vhost backend support, and anything else will just work out". That
> > way if an omission is made, we would end up with the usually safer
> > under-indication  instead of the current over-indication.
> >
> >
> > @Michael, Jason: Could you guys chime in?  
> 
> Another issue is that it seems to require a change of the semantic of
> VHOST_GET_FEATURES. If my understanding is true, it seems a
> non-trivial change which I'm not sure it's worth to bother.

I don't quite understand. Would you mind to elaborate on this?

For starters, what is the current semantic of VHOST_GET_FEATURES, and
where is it documented? You mean the ioctl, right?

Then why do you think the semantic of VHOST_GET_FEATURES should change?

IMHO changing the semantic of the VHOST_GET_FEATURES ioctl is not viable,
but also not necessary. What I am proposing is changing the (in QEMU)
logic of processing the features returned by VHOST_GET_FEATURES, while
preserving the outcomes (essentially realize the same function in a
mathematical sense, but with code that is less fragile), modulo bugs like
the one addressed with this patch of course.

Regards,
Halil
Halil Pasic May 29, 2024, 12:49 p.m. UTC | #8
On Tue, 28 May 2024 17:32:26 +0200
Stefano Garzarella <sgarzare@redhat.com> wrote:

> >1) The uses is explicitly asking for a vhost device and giving the user
> >a non vhost device is not an option.  
> 
> I didn't get this point :-( can you elaborate?

I was thinking along the lines: QEMU gets told what devices to
provision, and that includes things like what virtio features,
and what kind of a backend.

In this example, the default for vsock-vhost is no VIRTIO_F_RING_PACKED,
but if we tell QEMU to create a vsock-vhost device with the feature
VIRTIO_F_RING_PACKED, things go south in a not nice way.

Given that vhost not supporting VIRTIO_F_RING_PACKED as of today is a
fact of life we must accept, there are multiple ways how such a situation
can be handled.

For instance vhost-net is handling this by the device not offering the
VIRTIO_F_RING_PACKED feature. This is at least what I think I have
observed, but I would not mind somebody confirming it. But for the sake
of the argument, let us look at other options.

The straightforward one would be to not realize the device, because we
can't provide what we have been asked to provide. And this actually
makes me think about migration! What would happen, were we to
eventually introduce, packed to vhost and vhost net, and then attempt to
migrate between a host that has this new feature and host that has not. I
guess things would pretty much blow up in a very unpleasant way!

Then for some devices, at least in theory, it might be possible to
abandon not the feature but the backend. Along the lines we were asked to
provide the feature X with backend Y but since backend Y does not
support that feature and backed Z does, we will determistically go
with backend Z. But IMHO this is a purely theoretical consideration, and
we shall not go this way.

In any case if we are asked to provide with properties such that we
can't actually do that, something has to go out of the window: either
some of the properties, or the entire device.

Regards,
Halil
Jason Wang May 30, 2024, 2:34 a.m. UTC | #9
On Wed, May 29, 2024 at 8:18 PM Halil Pasic <pasic@linux.ibm.com> wrote:
>
> On Tue, 28 May 2024 11:25:51 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
> > > 5) Based on the following, I would very much prefer a per device list of
> > > features with the semantic "hey QEMU can do that feature without any
> > > specialized vhost-device support (e.g. like VIRTIO_SCSI_F_CHANGE)"
> >
> > Indeed the current code is kind of tricky and may need better
> > documentation. But the problem is some features were datapath related
> > and they are supported since the birth of a specific vhost device. For
> > example, some GSO related features (actually, it's not a feature of
> > vhost-net but TUN/TAP).
> >
> > And I've found another interesting thing, for RING_REST, actually we
> > don't need to do anything but we have the following commits:
> >
> > 313389be06ff6 ("vhost-net: support VIRTIO_F_RING_RESET")
> > 2a3552baafb ("vhost: vhost-kernel: enable vq reset feature")
> >
> > Technically, they are not necessary as RING_RESET for vhost-kernel
> > doesn't require any additional new ioctls. But it's too late to change
> > as the kernel commit has been merged.
> >
> > > over
> > > the current list with the semantics "these are the features that
> > > need vhost backend support, and anything else will just work out". That
> > > way if an omission is made, we would end up with the usually safer
> > > under-indication  instead of the current over-indication.
> > >
> > >
> > > @Michael, Jason: Could you guys chime in?
> >
> > Another issue is that it seems to require a change of the semantic of
> > VHOST_GET_FEATURES. If my understanding is true, it seems a
> > non-trivial change which I'm not sure it's worth to bother.
>
> I don't quite understand. Would you mind to elaborate on this?
>
> For starters, what is the current semantic of VHOST_GET_FEATURES, and
> where is it documented?

Unfortunately, no documentation. The semantic is kind of complicated
which requires the userspace to know how a specific vhost device
works.

For example, userspace knows vhost-net works with tuntap. So it checks
part of the features with vhost-net and the rest with tuntap.

> You mean the ioctl, right?

Yes.

>
> Then why do you think the semantic of VHOST_GET_FEATURES should change?

For example vhost-net can return GSO related features.

If we don't do that, I don't know how we could achieve
under-indication as you mentioned.

>
> IMHO changing the semantic of the VHOST_GET_FEATURES ioctl is not viable,
> but also not necessary. What I am proposing is changing the (in QEMU)
> logic of processing the features returned by VHOST_GET_FEATURES, while
> preserving the outcomes (essentially realize the same function in a
> mathematical sense, but with code that is less fragile), modulo bugs like
> the one addressed with this patch of course.

Ok, I think I misunderstood you here. Maybe an RFC to see?

Thanks

>
> Regards,
> Halil
>
Stefano Garzarella May 30, 2024, 7:56 a.m. UTC | #10
On Wed, May 29, 2024 at 02:49:28PM GMT, Halil Pasic wrote:
>On Tue, 28 May 2024 17:32:26 +0200
>Stefano Garzarella <sgarzare@redhat.com> wrote:
>
>> >1) The uses is explicitly asking for a vhost device and giving the user
>> >a non vhost device is not an option.
>>
>> I didn't get this point :-( can you elaborate?
>
>I was thinking along the lines: QEMU gets told what devices to
>provision, and that includes things like what virtio features,
>and what kind of a backend.
>
>In this example, the default for vsock-vhost is no VIRTIO_F_RING_PACKED,
>but if we tell QEMU to create a vsock-vhost device with the feature
>VIRTIO_F_RING_PACKED, things go south in a not nice way.
>
>Given that vhost not supporting VIRTIO_F_RING_PACKED as of today is a
>fact of life we must accept, there are multiple ways how such a situation
>can be handled.
>
>For instance vhost-net is handling this by the device not offering the
>VIRTIO_F_RING_PACKED feature. This is at least what I think I have
>observed, but I would not mind somebody confirming it. But for the sake
>of the argument, let us look at other options.
>
>The straightforward one would be to not realize the device, because we
>can't provide what we have been asked to provide. And this actually
>makes me think about migration! What would happen, were we to
>eventually introduce, packed to vhost and vhost net, and then attempt to
>migrate between a host that has this new feature and host that has not. I
>guess things would pretty much blow up in a very unpleasant way!

Yes, migration with vhost devices implies that the destination host 
supports at least the same features as the source host. We should 
consider how migration between 2 QEMUs and the destination doesn't 
support a required feature, I guess migration can't happen or the device 
has to be removed and then re-added without that feature.

>
>Then for some devices, at least in theory, it might be possible to
>abandon not the feature but the backend. Along the lines we were asked to
>provide the feature X with backend Y but since backend Y does not
>support that feature and backed Z does, we will determistically go
>with backend Z. But IMHO this is a purely theoretical consideration, and
>we shall not go this way.

Yep, I agree!

>
>In any case if we are asked to provide with properties such that we
>can't actually do that, something has to go out of the window: either
>some of the properties, or the entire device.

I see.

Thanks,
Stefano
diff mbox series

Patch

diff --git a/hw/virtio/vhost-vsock-common.c b/hw/virtio/vhost-vsock-common.c
index 12ea87d7a7..fd88df2560 100644
--- a/hw/virtio/vhost-vsock-common.c
+++ b/hw/virtio/vhost-vsock-common.c
@@ -22,6 +22,7 @@ 
 const int feature_bits[] = {
     VIRTIO_VSOCK_F_SEQPACKET,
     VIRTIO_F_RING_RESET,
+    VIRTIO_F_RING_PACKED,
     VHOST_INVALID_FEATURE_BIT
 };