Message ID | 20210204202915.15925-1-yuri.benditovich@daynix.com (mailing list archive) |
---|---|
Headers | show |
Series | virtio-net: graceful drop of vhost for TAP | expand |
On Thu, Feb 04, 2021 at 10:29:12PM +0200, Yuri Benditovich wrote: > This set of patches introduces graceful switch from tap-vhost to > tap-no-vhost depending on guest features. Before that the features > that vhost does not support were silently cleared in get_features. > This creates potential problem of migration from the machine where > some of virtio-net features are supported by the vhost kernel to the > machine where they are not supported (packed ring as an example). I still worry that adding new features will silently disable vhost for people. Can we limit the change to when a VM is migrated in? > Instead of silent masking of the features virtio-net gracefully > disables the vhost at set_features if some features acked by the > guest contradict with kernel vhost capabilities. > > This set of patches also makes get_vhost_net() call (that used > everywhere) to always return actual result, i.e. initially it > returns non-NULL value and from the moment the vhost was disabled > the call will return NULL. Such a way we avoid any unexpected > calls to vhost functions. > Yuri Benditovich (3): > vhost-net: add VIRTIO_NET_F_HASH_REPORT to the list of kernel features > net: add ability to hide (disable) vhost_net > virtio-net: graceful fallback to vhost=off for tap netdev > > hw/net/vhost_net.c | 5 +++- > hw/net/virtio-net.c | 58 ++++++++++++++++++++++++++++++++++++++------- > include/net/net.h | 1 + > 3 files changed, 55 insertions(+), 9 deletions(-) > > -- > 2.17.1
On Tue, Feb 09, 2021 at 09:34:20AM -0500, Michael S. Tsirkin wrote: > On Thu, Feb 04, 2021 at 10:29:12PM +0200, Yuri Benditovich wrote: > > This set of patches introduces graceful switch from tap-vhost to > > tap-no-vhost depending on guest features. Before that the features > > that vhost does not support were silently cleared in get_features. > > This creates potential problem of migration from the machine where > > some of virtio-net features are supported by the vhost kernel to the > > machine where they are not supported (packed ring as an example). > > I still worry that adding new features will silently disable vhost for people. > Can we limit the change to when a VM is migrated in? Some management applications expect bi-directional live migration to work, so taking specific actions on incoming migration only feels dangerous. IMHO if the features we're adding cannot be expected to exist in host kernels in general, then the feature should defualt to off and require explicit user config to enable. Downstream distros which can guarantee newer kernels can flip the default in their custom machine types if they desire. Regards, Daniel
On Tue, Feb 09, 2021 at 02:51:05PM +0000, Daniel P. Berrangé wrote: > On Tue, Feb 09, 2021 at 09:34:20AM -0500, Michael S. Tsirkin wrote: > > On Thu, Feb 04, 2021 at 10:29:12PM +0200, Yuri Benditovich wrote: > > > This set of patches introduces graceful switch from tap-vhost to > > > tap-no-vhost depending on guest features. Before that the features > > > that vhost does not support were silently cleared in get_features. > > > This creates potential problem of migration from the machine where > > > some of virtio-net features are supported by the vhost kernel to the > > > machine where they are not supported (packed ring as an example). > > > > I still worry that adding new features will silently disable vhost for people. > > Can we limit the change to when a VM is migrated in? > > Some management applications expect bi-directional live migration to > work, so taking specific actions on incoming migration only feels > dangerous. Could you be more specific? Bi-directional migration is currently broken when migrating new kernel->old kernel. This seems to be the motivation for this patch, though I wish it was spelled out more explicitly. People don't complain much, but I'm fine with fixing that with a userspace fallback. I'd rather not force the fallback on others though: vhost is generally specified explicitly by user while features are generally set automatically, so this patch will make us override what user specified, not nice. > IMHO if the features we're adding cannot be expected to exist in > host kernels in general, then the feature should defualt to off > and require explicit user config to enable. > Downstream distros which can guarantee newer kernels can flip the > default in their custom machine types if they desire. > > Regards, > Daniel Unfortunately that will basically mean we are stuck with no new features for years. We did what this patch is trying to change for years now, in particular KVM also seems to happily disable CPU features not supported by kernel so I wonder why we can't keep doing it, with tweaks for some corner cases. userspace and kernel not being in 100% sync wrt features is not a corner case though, and switching backends seems like too big a hammer. > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Tue, Feb 09, 2021 at 10:04:30AM -0500, Michael S. Tsirkin wrote: > On Tue, Feb 09, 2021 at 02:51:05PM +0000, Daniel P. Berrangé wrote: > > On Tue, Feb 09, 2021 at 09:34:20AM -0500, Michael S. Tsirkin wrote: > > > On Thu, Feb 04, 2021 at 10:29:12PM +0200, Yuri Benditovich wrote: > > > > This set of patches introduces graceful switch from tap-vhost to > > > > tap-no-vhost depending on guest features. Before that the features > > > > that vhost does not support were silently cleared in get_features. > > > > This creates potential problem of migration from the machine where > > > > some of virtio-net features are supported by the vhost kernel to the > > > > machine where they are not supported (packed ring as an example). > > > > > > I still worry that adding new features will silently disable vhost for people. > > > Can we limit the change to when a VM is migrated in? > > > > Some management applications expect bi-directional live migration to > > work, so taking specific actions on incoming migration only feels > > dangerous. > > Could you be more specific? > > Bi-directional migration is currently broken > when migrating new kernel->old kernel. At the very least new QEMU -> old QEMU, but in general new kernel to old kerenl would be expected too. Assuming a QEMU command line is using a versioned machine type, and not using host passthrough features (-cpu host), then it should in general represent a fixed guest ABI independant of kernel version/features. > > IMHO if the features we're adding cannot be expected to exist in > > host kernels in general, then the feature should defualt to off > > and require explicit user config to enable. > > Downstream distros which can guarantee newer kernels can flip the > > default in their custom machine types if they desire. > > Unfortunately that will basically mean we are stuck with no new features > for years. We did what this patch is trying to change for years now, in > particular KVM also seems to happily disable CPU features not supported > by kernel so I wonder why we can't keep doing it, with tweaks for some > corner cases. Yep, it is not great from a upstream POV. In upstream we've traditionally not expressed a specific min kernel version we expect to be run on. So that lmits what we can do in upstram machine types. Downstram distros can expose stuff pretty much immediately if they desire. eg if RHEL 8 adds the kernel feature in 8.5, it could in theory add the feature by default in pc-q35-rhel8.5.0 machine types (if we ignore pain of containers running on old kernels). Even if we take into account mismatched usrespace/kernel, the downstram distros can adopt much more quickly. There is probably scope for upstream to be more explicit about min required kernel features, but we'll always be worse upstream because we target a broader range of platforms/distros. > userspace and kernel not being in 100% sync wrt features is not > a corner case though, and switching backends seems like too big > a hammer. Yes, the containers world in particular has made life much worse, because the mis-matched combinations of userspace and kernel are much more likely. In fact I'd say out of sync kernel+userspace is arguably going to become the common case in future. eg worst case someone running a RHEL-8 userspace on a RHEL-7 container host or vica verca. Regards, Daniel
On 2021/2/9 下午11:04, Michael S. Tsirkin wrote: > On Tue, Feb 09, 2021 at 02:51:05PM +0000, Daniel P. Berrangé wrote: >> On Tue, Feb 09, 2021 at 09:34:20AM -0500, Michael S. Tsirkin wrote: >>> On Thu, Feb 04, 2021 at 10:29:12PM +0200, Yuri Benditovich wrote: >>>> This set of patches introduces graceful switch from tap-vhost to >>>> tap-no-vhost depending on guest features. Before that the features >>>> that vhost does not support were silently cleared in get_features. >>>> This creates potential problem of migration from the machine where >>>> some of virtio-net features are supported by the vhost kernel to the >>>> machine where they are not supported (packed ring as an example). >>> I still worry that adding new features will silently disable vhost for people. >>> Can we limit the change to when a VM is migrated in? >> Some management applications expect bi-directional live migration to >> work, so taking specific actions on incoming migration only feels >> dangerous. > Could you be more specific? > > Bi-directional migration is currently broken > when migrating new kernel->old kernel. > > This seems to be the motivation for this patch, though I wish > it was spelled out more explicitly. > > People don't complain much, but I'm fine with fixing that > with a userspace fallback. > > > I'd rather not force the fallback on others though: vhost is generally > specified explicitly by user while features are generally set > automatically, so this patch will make us override what user specified, > not nice. > > >> IMHO if the features we're adding cannot be expected to exist in >> host kernels in general, then the feature should defualt to off >> and require explicit user config to enable. >> Downstream distros which can guarantee newer kernels can flip the >> default in their custom machine types if they desire. >> >> Regards, >> Daniel > Unfortunately that will basically mean we are stuck with no new features > for years. We did what this patch is trying to change for years now, in > particular KVM also seems to happily disable CPU features not supported > by kernel so I wonder why we can't keep doing it, with tweaks for some > corner cases. It's probably not the corner case. So my understanding is when a feature is turned on via command line, it should not be cleared silently otherwise we may break migration for sure. E.g when packed=on is specified, we should disable vhost instead of clear it from the device. Thanks > > userspace and kernel not being in 100% sync wrt features is not > a corner case though, and switching backends seems like too big > a hammer. > >> -- >> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| >> |: https://libvirt.org -o- https://fstop138.berrange.com :| >> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| >
On Wed, Feb 10, 2021 at 02:19:59PM +0800, Jason Wang wrote: > > On 2021/2/9 下午11:04, Michael S. Tsirkin wrote: > > On Tue, Feb 09, 2021 at 02:51:05PM +0000, Daniel P. Berrangé wrote: > > > On Tue, Feb 09, 2021 at 09:34:20AM -0500, Michael S. Tsirkin wrote: > > > > On Thu, Feb 04, 2021 at 10:29:12PM +0200, Yuri Benditovich wrote: > > > > > This set of patches introduces graceful switch from tap-vhost to > > > > > tap-no-vhost depending on guest features. Before that the features > > > > > that vhost does not support were silently cleared in get_features. > > > > > This creates potential problem of migration from the machine where > > > > > some of virtio-net features are supported by the vhost kernel to the > > > > > machine where they are not supported (packed ring as an example). > > > > I still worry that adding new features will silently disable vhost for people. > > > > Can we limit the change to when a VM is migrated in? > > > Some management applications expect bi-directional live migration to > > > work, so taking specific actions on incoming migration only feels > > > dangerous. > > Could you be more specific? > > > > Bi-directional migration is currently broken > > when migrating new kernel->old kernel. > > > > This seems to be the motivation for this patch, though I wish > > it was spelled out more explicitly. > > > > People don't complain much, but I'm fine with fixing that > > with a userspace fallback. > > > > > > I'd rather not force the fallback on others though: vhost is generally > > specified explicitly by user while features are generally set > > automatically, so this patch will make us override what user specified, > > not nice. > > > > > > > IMHO if the features we're adding cannot be expected to exist in > > > host kernels in general, then the feature should defualt to off > > > and require explicit user config to enable. > > > Downstream distros which can guarantee newer kernels can flip the > > > default in their custom machine types if they desire. > > > > > > Regards, > > > Daniel > > Unfortunately that will basically mean we are stuck with no new features > > for years. We did what this patch is trying to change for years now, in > > particular KVM also seems to happily disable CPU features not supported > > by kernel so I wonder why we can't keep doing it, with tweaks for some > > corner cases. > > > It's probably not the corner case. > > So my understanding is when a feature is turned on via command line, Most people do not play with these flags on command line, the main path to turn features on if when they default to on. > it > should not be cleared silently otherwise we may break migration for sure. Not if we are careful. Setting flags is more dangerous. Clearing is generally ok. > E.g when packed=on is specified, we should disable vhost instead of clear it > from the device. > > Thanks From usability POV, consider packed as an example, people only enable it to get better performance. libvirt says: The attribute packed controls if QEMU should try to use packed virtqueues. Compared to regular split queues, packed queues consist of only a single descriptor ring replacing available and used ring, index and descriptor buffer. This can result in better cache utilization and performance. If packed virtqueues are actually used depends on the feature negotiation between QEMU, vhost backends and guest drivers. Possible values are on or off. Switching to a completely different backend clearly isn't what user intended. > > > > > userspace and kernel not being in 100% sync wrt features is not > > a corner case though, and switching backends seems like too big > > a hammer. > > > > > -- > > > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > > > |: https://libvirt.org -o- https://fstop138.berrange.com :| > > > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| > >
On 2021/2/10 下午4:38, Michael S. Tsirkin wrote: > On Wed, Feb 10, 2021 at 02:19:59PM +0800, Jason Wang wrote: >> On 2021/2/9 下午11:04, Michael S. Tsirkin wrote: >>> On Tue, Feb 09, 2021 at 02:51:05PM +0000, Daniel P. Berrangé wrote: >>>> On Tue, Feb 09, 2021 at 09:34:20AM -0500, Michael S. Tsirkin wrote: >>>>> On Thu, Feb 04, 2021 at 10:29:12PM +0200, Yuri Benditovich wrote: >>>>>> This set of patches introduces graceful switch from tap-vhost to >>>>>> tap-no-vhost depending on guest features. Before that the features >>>>>> that vhost does not support were silently cleared in get_features. >>>>>> This creates potential problem of migration from the machine where >>>>>> some of virtio-net features are supported by the vhost kernel to the >>>>>> machine where they are not supported (packed ring as an example). >>>>> I still worry that adding new features will silently disable vhost for people. >>>>> Can we limit the change to when a VM is migrated in? >>>> Some management applications expect bi-directional live migration to >>>> work, so taking specific actions on incoming migration only feels >>>> dangerous. >>> Could you be more specific? >>> >>> Bi-directional migration is currently broken >>> when migrating new kernel->old kernel. >>> >>> This seems to be the motivation for this patch, though I wish >>> it was spelled out more explicitly. >>> >>> People don't complain much, but I'm fine with fixing that >>> with a userspace fallback. >>> >>> >>> I'd rather not force the fallback on others though: vhost is generally >>> specified explicitly by user while features are generally set >>> automatically, so this patch will make us override what user specified, >>> not nice. >>> >>> >>>> IMHO if the features we're adding cannot be expected to exist in >>>> host kernels in general, then the feature should defualt to off >>>> and require explicit user config to enable. >>>> Downstream distros which can guarantee newer kernels can flip the >>>> default in their custom machine types if they desire. >>>> >>>> Regards, >>>> Daniel >>> Unfortunately that will basically mean we are stuck with no new features >>> for years. We did what this patch is trying to change for years now, in >>> particular KVM also seems to happily disable CPU features not supported >>> by kernel so I wonder why we can't keep doing it, with tweaks for some >>> corner cases. >> >> It's probably not the corner case. >> >> So my understanding is when a feature is turned on via command line, > > Most people do not play with these flags on command line, the > main path to turn features on if when they default to on. The problem is qemu may choose to clear feature whose default are on. > >> it >> should not be cleared silently otherwise we may break migration for sure. > Not if we are careful. Setting flags is more dangerous. The problem is that user is expected to enable the feature for the device. If Qemu fail to do that, shouldn't we fail the device initialization? > Clearing is > generally ok. > >> E.g when packed=on is specified, we should disable vhost instead of clear it >> from the device. >> >> Thanks > From usability POV, consider packed as an example, people only enable it > to get better performance. libvirt says: > > > The attribute packed controls if QEMU should try to use packed > virtqueues. Compared to regular split queues, packed queues consist of > only a single descriptor ring replacing available and used ring, index > and descriptor buffer. This can result in better cache utilization and > performance. If packed virtqueues are actually used depends on the > feature negotiation between QEMU, vhost backends and guest drivers. > Possible values are on or off. > > Switching to a completely different backend clearly isn't what user intended. If we don't do migration, all should be fine. But the problem is the migration compatibility. My understanding is that libvirt will assume the compatibility if the same user-visible feature were set through cli. So if any feature were cleared the migration compatibility will break. This will also be a burden to support cross vendor live migration in the future for vDPA. Thanks > >>> userspace and kernel not being in 100% sync wrt features is not >>> a corner case though, and switching backends seems like too big >>> a hammer. >>> >>>> -- >>>> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| >>>> |: https://libvirt.org -o- https://fstop138.berrange.com :| >>>> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Tue, Feb 09, 2021 at 10:04:30AM -0500, Michael S. Tsirkin wrote: > On Tue, Feb 09, 2021 at 02:51:05PM +0000, Daniel P. Berrangé wrote: > > On Tue, Feb 09, 2021 at 09:34:20AM -0500, Michael S. Tsirkin wrote: > > > On Thu, Feb 04, 2021 at 10:29:12PM +0200, Yuri Benditovich wrote: > > > > This set of patches introduces graceful switch from tap-vhost to > > > > tap-no-vhost depending on guest features. Before that the features > > > > that vhost does not support were silently cleared in get_features. > > > > This creates potential problem of migration from the machine where > > > > some of virtio-net features are supported by the vhost kernel to the > > > > machine where they are not supported (packed ring as an example). > > > > > > I still worry that adding new features will silently disable vhost for people. > > > Can we limit the change to when a VM is migrated in? > > > > Some management applications expect bi-directional live migration to > > work, so taking specific actions on incoming migration only feels > > dangerous. > > Could you be more specific? > > Bi-directional migration is currently broken > when migrating new kernel->old kernel. > > This seems to be the motivation for this patch, though I wish > it was spelled out more explicitly. > > People don't complain much, but I'm fine with fixing that > with a userspace fallback. > > > I'd rather not force the fallback on others though: vhost is generally > specified explicitly by user while features are generally set > automatically, so this patch will make us override what user specified, > not nice. > > > > IMHO if the features we're adding cannot be expected to exist in > > host kernels in general, then the feature should defualt to off > > and require explicit user config to enable. > > Downstream distros which can guarantee newer kernels can flip the > > default in their custom machine types if they desire. > > > > Regards, > > Daniel > > Unfortunately that will basically mean we are stuck with no new features > for years. We did what this patch is trying to change for years now, in > particular KVM also seems to happily disable CPU features not supported > by kernel so I wonder why we can't keep doing it, with tweaks for some > corner cases. I should say the kernel's continual changing in CPU features that are exposed has been responsible for a *huge* number of bugs with live migration compatibility. libvirt, QEMU & apps have needed to introduce a lot of extra code to try to cope with the changing CPU features across migration and i still goes wrong to this very day, because we have to migrate from prehistoric QEMU versions to quite modern versions. IOW, the CPU features approach is a perfect example of why we should *not* introduce a kernel dependancy in more areas of QEMU feature enablement, and instead should strictly tie feature defaults to the machine type versions. Regards, Daniel
On Wed, Feb 10, 2021 at 02:19:59PM +0800, Jason Wang wrote: > > On 2021/2/9 下午11:04, Michael S. Tsirkin wrote: > > On Tue, Feb 09, 2021 at 02:51:05PM +0000, Daniel P. Berrangé wrote: > > > On Tue, Feb 09, 2021 at 09:34:20AM -0500, Michael S. Tsirkin wrote: > > > > On Thu, Feb 04, 2021 at 10:29:12PM +0200, Yuri Benditovich wrote: > > > > > This set of patches introduces graceful switch from tap-vhost to > > > > > tap-no-vhost depending on guest features. Before that the features > > > > > that vhost does not support were silently cleared in get_features. > > > > > This creates potential problem of migration from the machine where > > > > > some of virtio-net features are supported by the vhost kernel to the > > > > > machine where they are not supported (packed ring as an example). > > > > I still worry that adding new features will silently disable vhost for people. > > > > Can we limit the change to when a VM is migrated in? > > > Some management applications expect bi-directional live migration to > > > work, so taking specific actions on incoming migration only feels > > > dangerous. > > Could you be more specific? > > > > Bi-directional migration is currently broken > > when migrating new kernel->old kernel. > > > > This seems to be the motivation for this patch, though I wish > > it was spelled out more explicitly. > > > > People don't complain much, but I'm fine with fixing that > > with a userspace fallback. > > > > > > I'd rather not force the fallback on others though: vhost is generally > > specified explicitly by user while features are generally set > > automatically, so this patch will make us override what user specified, > > not nice. > > > > > > > IMHO if the features we're adding cannot be expected to exist in > > > host kernels in general, then the feature should defualt to off > > > and require explicit user config to enable. > > > Downstream distros which can guarantee newer kernels can flip the > > > default in their custom machine types if they desire. > > > > > > Regards, > > > Daniel > > Unfortunately that will basically mean we are stuck with no new features > > for years. We did what this patch is trying to change for years now, in > > particular KVM also seems to happily disable CPU features not supported > > by kernel so I wonder why we can't keep doing it, with tweaks for some > > corner cases. > > > It's probably not the corner case. > > So my understanding is when a feature is turned on via command line, it > should not be cleared silently otherwise we may break migration for sure. > > E.g when packed=on is specified, we should disable vhost instead of clear it > from the device. If something is explicitly turned on by the user, they expect that feature to be honoured, or an error to be raised. If something is not explicitly turned on by the user, the behaviour wrt the default should be stable for any given machine type version. IOW, if you disable vhost by default when packed=on is set, then you can't later switch to letting vhost be enabled with packed=on, unless you tie that change to a new machine type. If the user has explicitly said packed=on *and* vhost=on, then should must honour that, or raise an error if the combination is unsupportable. Silently disabling vhost, then vhost=on is not ok. Regards, Daniel
On Thu, Feb 18, 2021 at 11:35 AM Daniel P. Berrangé <berrange@redhat.com> wrote: > On Wed, Feb 10, 2021 at 02:19:59PM +0800, Jason Wang wrote: > > > > On 2021/2/9 下午11:04, Michael S. Tsirkin wrote: > > > On Tue, Feb 09, 2021 at 02:51:05PM +0000, Daniel P. Berrangé wrote: > > > > On Tue, Feb 09, 2021 at 09:34:20AM -0500, Michael S. Tsirkin wrote: > > > > > On Thu, Feb 04, 2021 at 10:29:12PM +0200, Yuri Benditovich wrote: > > > > > > This set of patches introduces graceful switch from tap-vhost to > > > > > > tap-no-vhost depending on guest features. Before that the > features > > > > > > that vhost does not support were silently cleared in > get_features. > > > > > > This creates potential problem of migration from the machine > where > > > > > > some of virtio-net features are supported by the vhost kernel to > the > > > > > > machine where they are not supported (packed ring as an example). > > > > > I still worry that adding new features will silently disable vhost > for people. > > > > > Can we limit the change to when a VM is migrated in? > > > > Some management applications expect bi-directional live migration to > > > > work, so taking specific actions on incoming migration only feels > > > > dangerous. > > > Could you be more specific? > > > > > > Bi-directional migration is currently broken > > > when migrating new kernel->old kernel. > > > > > > This seems to be the motivation for this patch, though I wish > > > it was spelled out more explicitly. > > > > > > People don't complain much, but I'm fine with fixing that > > > with a userspace fallback. > > > > > > > > > I'd rather not force the fallback on others though: vhost is generally > > > specified explicitly by user while features are generally set > > > automatically, so this patch will make us override what user specified, > > > not nice. > > > > > > > > > > IMHO if the features we're adding cannot be expected to exist in > > > > host kernels in general, then the feature should defualt to off > > > > and require explicit user config to enable. > > > > Downstream distros which can guarantee newer kernels can flip the > > > > default in their custom machine types if they desire. > > > > > > > > Regards, > > > > Daniel > > > Unfortunately that will basically mean we are stuck with no new > features > > > for years. We did what this patch is trying to change for years now, in > > > particular KVM also seems to happily disable CPU features not supported > > > by kernel so I wonder why we can't keep doing it, with tweaks for some > > > corner cases. > > > > > > It's probably not the corner case. > > > > So my understanding is when a feature is turned on via command line, it > > should not be cleared silently otherwise we may break migration for sure. > > > > E.g when packed=on is specified, we should disable vhost instead of > clear it > > from the device. > > If something is explicitly turned on by the user, they expect that feature > to be honoured, or an error to be raised. > > If something is not explicitly turned on by the user, the behaviour wrt the > default should be stable for any given machine type version. > > IOW, if you disable vhost by default when packed=on is set, then you can't > later switch to letting vhost be enabled with packed=on, unless you tie > that change to a new machine type. > > If the user has explicitly said packed=on *and* vhost=on, then should > must honour that, or raise an error if the combination is unsupportable. > Silently disabling vhost, then vhost=on is not ok. > If I'm not mistaken: Inside qemu there is no possibility to determine whether the user explicitly turned vhost on. For qemu the vhost is off by default but libvirt creates a new profile with vhost on. > > Regards, > Daniel > -- > |: https://berrange.com -o- > https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- > https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- > https://www.instagram.com/dberrange :| > >
On Thu, Feb 18, 2021 at 09:55:25PM +0200, Yuri Benditovich wrote: > On Thu, Feb 18, 2021 at 11:35 AM Daniel P. Berrangé <berrange@redhat.com> > wrote: > > > On Wed, Feb 10, 2021 at 02:19:59PM +0800, Jason Wang wrote: > > > > > > On 2021/2/9 下午11:04, Michael S. Tsirkin wrote: > > > > On Tue, Feb 09, 2021 at 02:51:05PM +0000, Daniel P. Berrangé wrote: > > > > > On Tue, Feb 09, 2021 at 09:34:20AM -0500, Michael S. Tsirkin wrote: > > > > > > On Thu, Feb 04, 2021 at 10:29:12PM +0200, Yuri Benditovich wrote: > > > > > > > This set of patches introduces graceful switch from tap-vhost to > > > > > > > tap-no-vhost depending on guest features. Before that the > > features > > > > > > > that vhost does not support were silently cleared in > > get_features. > > > > > > > This creates potential problem of migration from the machine > > where > > > > > > > some of virtio-net features are supported by the vhost kernel to > > the > > > > > > > machine where they are not supported (packed ring as an example). > > > > > > I still worry that adding new features will silently disable vhost > > for people. > > > > > > Can we limit the change to when a VM is migrated in? > > > > > Some management applications expect bi-directional live migration to > > > > > work, so taking specific actions on incoming migration only feels > > > > > dangerous. > > > > Could you be more specific? > > > > > > > > Bi-directional migration is currently broken > > > > when migrating new kernel->old kernel. > > > > > > > > This seems to be the motivation for this patch, though I wish > > > > it was spelled out more explicitly. > > > > > > > > People don't complain much, but I'm fine with fixing that > > > > with a userspace fallback. > > > > > > > > > > > > I'd rather not force the fallback on others though: vhost is generally > > > > specified explicitly by user while features are generally set > > > > automatically, so this patch will make us override what user specified, > > > > not nice. > > > > > > > > > > > > > IMHO if the features we're adding cannot be expected to exist in > > > > > host kernels in general, then the feature should defualt to off > > > > > and require explicit user config to enable. > > > > > Downstream distros which can guarantee newer kernels can flip the > > > > > default in their custom machine types if they desire. > > > > > > > > > > Regards, > > > > > Daniel > > > > Unfortunately that will basically mean we are stuck with no new > > features > > > > for years. We did what this patch is trying to change for years now, in > > > > particular KVM also seems to happily disable CPU features not supported > > > > by kernel so I wonder why we can't keep doing it, with tweaks for some > > > > corner cases. > > > > > > > > > It's probably not the corner case. > > > > > > So my understanding is when a feature is turned on via command line, it > > > should not be cleared silently otherwise we may break migration for sure. > > > > > > E.g when packed=on is specified, we should disable vhost instead of > > clear it > > > from the device. > > > > If something is explicitly turned on by the user, they expect that feature > > to be honoured, or an error to be raised. > > > > If something is not explicitly turned on by the user, the behaviour wrt the > > default should be stable for any given machine type version. > > > > IOW, if you disable vhost by default when packed=on is set, then you can't > > later switch to letting vhost be enabled with packed=on, unless you tie > > that change to a new machine type. > > > > If the user has explicitly said packed=on *and* vhost=on, then should > > must honour that, or raise an error if the combination is unsupportable. > > Silently disabling vhost, then vhost=on is not ok. > > > > If I'm not mistaken: > Inside qemu there is no possibility to determine whether the user > explicitly turned vhost on. > For qemu the vhost is off by default but libvirt creates a new profile with > vhost on. Yes, libvirt will always attempt to enable vhost if it is present on te host with /dev/vhost-net, except where the user explicitly told us not to. Regards, Daniel