Message ID | 20210914131716.102851-1-dgilbert@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | virtio-balloon: Fix page-poison subsection name | expand |
On 14.09.21 15:17, Dr. David Alan Gilbert (git) wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > The subsection name for page-poison was typo'd as: > > vitio-balloon-device/page-poison > > Note the missing 'r' in virtio. > > When we have a machine type that enables page poison, and the guest > enables it (which needs a new kernel), things fail rather unpredictably. > > The fallout from this is that most of the other subsections fail to > load, including things like the feature bits in the device, one > possible fallout is that the physical addresses of the queues > then get aligned differently and we fail with an error about > last_avail_idx being wrong. > It's not obvious to me why this doesn't produce a more obvious failure, > but virtio's vmstate loading is a bit open-coded. > > Fixes: 7483cbbaf82 ("virtio-balloon: Implement support for page poison reporting feature") > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1984401 > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > hw/virtio/virtio-balloon.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c > index 5a69dce35d..c6962fcbfe 100644 > --- a/hw/virtio/virtio-balloon.c > +++ b/hw/virtio/virtio-balloon.c > @@ -852,7 +852,7 @@ static const VMStateDescription vmstate_virtio_balloon_free_page_hint = { > }; > > static const VMStateDescription vmstate_virtio_balloon_page_poison = { > - .name = "vitio-balloon-device/page-poison", > + .name = "virtio-balloon-device/page-poison", > .version_id = 1, > .minimum_version_id = 1, > .needed = virtio_balloon_page_poison_support, > Oh, that's very subtle. I wasn't even aware that the prefix really has to match the actual device ... I thought the whole idea of the prefix here was just to make the string unique ... Reviewed-by: David Hildenbrand <david@redhat.com> Thanks!
* David Hildenbrand (david@redhat.com) wrote: > On 14.09.21 15:17, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > The subsection name for page-poison was typo'd as: > > > > vitio-balloon-device/page-poison > > > > Note the missing 'r' in virtio. > > > > When we have a machine type that enables page poison, and the guest > > enables it (which needs a new kernel), things fail rather unpredictably. > > > > The fallout from this is that most of the other subsections fail to > > load, including things like the feature bits in the device, one > > possible fallout is that the physical addresses of the queues > > then get aligned differently and we fail with an error about > > last_avail_idx being wrong. > > It's not obvious to me why this doesn't produce a more obvious failure, > > but virtio's vmstate loading is a bit open-coded. > > > > Fixes: 7483cbbaf82 ("virtio-balloon: Implement support for page poison reporting feature") > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1984401 > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > --- > > hw/virtio/virtio-balloon.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c > > index 5a69dce35d..c6962fcbfe 100644 > > --- a/hw/virtio/virtio-balloon.c > > +++ b/hw/virtio/virtio-balloon.c > > @@ -852,7 +852,7 @@ static const VMStateDescription vmstate_virtio_balloon_free_page_hint = { > > }; > > static const VMStateDescription vmstate_virtio_balloon_page_poison = { > > - .name = "vitio-balloon-device/page-poison", > > + .name = "virtio-balloon-device/page-poison", > > .version_id = 1, > > .minimum_version_id = 1, > > .needed = virtio_balloon_page_poison_support, > > > > Oh, that's very subtle. I wasn't even aware that the prefix really has to > match the actual device ... I thought the whole idea of the prefix here was > just to make the string unique ... Subsection naming is *very* critical; the logic is something like: 'we're loading the X device' a subsection arrives for 'N/P' if 'X==N' then it looks in X for subsection P. If 'X!=N' then it assumes we've finished loading X and P is really for an outer device that X is part of. This is horrible. Dave > Reviewed-by: David Hildenbrand <david@redhat.com> > > Thanks! > > -- > Thanks, > > David / dhildenb >
On 9/14/21 3:17 PM, Dr. David Alan Gilbert (git) wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > The subsection name for page-poison was typo'd as: > > vitio-balloon-device/page-poison > > Note the missing 'r' in virtio. > > When we have a machine type that enables page poison, and the guest > enables it (which needs a new kernel), things fail rather unpredictably. IIUC since v5.1 guests have 'page-poison'=true but once migrated they become 'page-poison'=unset=false? > The fallout from this is that most of the other subsections fail to > load, including things like the feature bits in the device, one > possible fallout is that the physical addresses of the queues > then get aligned differently and we fail with an error about > last_avail_idx being wrong. > It's not obvious to me why this doesn't produce a more obvious failure, > but virtio's vmstate loading is a bit open-coded. :/ > Fixes: 7483cbbaf82 ("virtio-balloon: Implement support for page poison reporting feature") > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1984401 > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > hw/virtio/virtio-balloon.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c > index 5a69dce35d..c6962fcbfe 100644 > --- a/hw/virtio/virtio-balloon.c > +++ b/hw/virtio/virtio-balloon.c > @@ -852,7 +852,7 @@ static const VMStateDescription vmstate_virtio_balloon_free_page_hint = { > }; > > static const VMStateDescription vmstate_virtio_balloon_page_poison = { > - .name = "vitio-balloon-device/page-poison", > + .name = "virtio-balloon-device/page-poison", > .version_id = 1, > .minimum_version_id = 1, > .needed = virtio_balloon_page_poison_support, >
On 14.09.21 15:47, Philippe Mathieu-Daudé wrote: > On 9/14/21 3:17 PM, Dr. David Alan Gilbert (git) wrote: >> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> >> >> The subsection name for page-poison was typo'd as: >> >> vitio-balloon-device/page-poison >> >> Note the missing 'r' in virtio. >> >> When we have a machine type that enables page poison, and the guest >> enables it (which needs a new kernel), things fail rather unpredictably. > > IIUC since v5.1 guests have 'page-poison'=true but once migrated > they become 'page-poison'=unset=false? We only migrate the subsection if the guest supports the feature (-> newer guest kernel). If we migrate, it's a broken migration stream. If we don't migrate, everything is fine.
On Tue, Sep 14, 2021 at 02:30:06PM +0100, Dr. David Alan Gilbert wrote: > * David Hildenbrand (david@redhat.com) wrote: > > On 14.09.21 15:17, Dr. David Alan Gilbert (git) wrote: > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > > > The subsection name for page-poison was typo'd as: > > > > > > vitio-balloon-device/page-poison > > > > > > Note the missing 'r' in virtio. > > > > > > When we have a machine type that enables page poison, and the guest > > > enables it (which needs a new kernel), things fail rather unpredictably. > > > > > > The fallout from this is that most of the other subsections fail to > > > load, including things like the feature bits in the device, one > > > possible fallout is that the physical addresses of the queues > > > then get aligned differently and we fail with an error about > > > last_avail_idx being wrong. > > > It's not obvious to me why this doesn't produce a more obvious failure, > > > but virtio's vmstate loading is a bit open-coded. > > > > > > Fixes: 7483cbbaf82 ("virtio-balloon: Implement support for page poison reporting feature") > > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1984401 > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > > --- > > > hw/virtio/virtio-balloon.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c > > > index 5a69dce35d..c6962fcbfe 100644 > > > --- a/hw/virtio/virtio-balloon.c > > > +++ b/hw/virtio/virtio-balloon.c > > > @@ -852,7 +852,7 @@ static const VMStateDescription vmstate_virtio_balloon_free_page_hint = { > > > }; > > > static const VMStateDescription vmstate_virtio_balloon_page_poison = { > > > - .name = "vitio-balloon-device/page-poison", > > > + .name = "virtio-balloon-device/page-poison", > > > .version_id = 1, > > > .minimum_version_id = 1, > > > .needed = virtio_balloon_page_poison_support, > > > > > > > Oh, that's very subtle. I wasn't even aware that the prefix really has to > > match the actual device ... I thought the whole idea of the prefix here was > > just to make the string unique ... > > Subsection naming is *very* critical; the logic is something like: > 'we're loading the X device' > a subsection arrives for 'N/P' > if 'X==N' then it looks in X for subsection P. > If 'X!=N' then it assumes we've finished loading X > and P is really for an outer device that X is part of. > This is horrible. Is there value in making this more explicit via a code convention for .name field initializers. eg instead of .name = "virtio-balloon-device/page-poison", Prefer .name = TYPE_VIRTIO_BALLOON "/page-poison" ? Regards, Daniel
* Daniel P. Berrangé (berrange@redhat.com) wrote: > On Tue, Sep 14, 2021 at 02:30:06PM +0100, Dr. David Alan Gilbert wrote: > > * David Hildenbrand (david@redhat.com) wrote: > > > On 14.09.21 15:17, Dr. David Alan Gilbert (git) wrote: > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > > > > > The subsection name for page-poison was typo'd as: > > > > > > > > vitio-balloon-device/page-poison > > > > > > > > Note the missing 'r' in virtio. > > > > > > > > When we have a machine type that enables page poison, and the guest > > > > enables it (which needs a new kernel), things fail rather unpredictably. > > > > > > > > The fallout from this is that most of the other subsections fail to > > > > load, including things like the feature bits in the device, one > > > > possible fallout is that the physical addresses of the queues > > > > then get aligned differently and we fail with an error about > > > > last_avail_idx being wrong. > > > > It's not obvious to me why this doesn't produce a more obvious failure, > > > > but virtio's vmstate loading is a bit open-coded. > > > > > > > > Fixes: 7483cbbaf82 ("virtio-balloon: Implement support for page poison reporting feature") > > > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1984401 > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > > > --- > > > > hw/virtio/virtio-balloon.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c > > > > index 5a69dce35d..c6962fcbfe 100644 > > > > --- a/hw/virtio/virtio-balloon.c > > > > +++ b/hw/virtio/virtio-balloon.c > > > > @@ -852,7 +852,7 @@ static const VMStateDescription vmstate_virtio_balloon_free_page_hint = { > > > > }; > > > > static const VMStateDescription vmstate_virtio_balloon_page_poison = { > > > > - .name = "vitio-balloon-device/page-poison", > > > > + .name = "virtio-balloon-device/page-poison", > > > > .version_id = 1, > > > > .minimum_version_id = 1, > > > > .needed = virtio_balloon_page_poison_support, > > > > > > > > > > Oh, that's very subtle. I wasn't even aware that the prefix really has to > > > match the actual device ... I thought the whole idea of the prefix here was > > > just to make the string unique ... > > > > Subsection naming is *very* critical; the logic is something like: > > 'we're loading the X device' > > a subsection arrives for 'N/P' > > if 'X==N' then it looks in X for subsection P. > > If 'X!=N' then it assumes we've finished loading X > > and P is really for an outer device that X is part of. > > This is horrible. > > Is there value in making this more explicit via a code convention > for .name field initializers. eg instead of > > .name = "virtio-balloon-device/page-poison", > > Prefer > > .name = TYPE_VIRTIO_BALLOON "/page-poison" > > ? I think it might be better to see if the vmstate code can check it and error during saving; certainly this case feels detectable, but I need to keep an eye open for all the other weird cases. Dave > > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| >
On 9/14/21 3:58 PM, Daniel P. Berrangé wrote: > On Tue, Sep 14, 2021 at 02:30:06PM +0100, Dr. David Alan Gilbert wrote: >> * David Hildenbrand (david@redhat.com) wrote: >>> On 14.09.21 15:17, Dr. David Alan Gilbert (git) wrote: >>>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> >>>> >>>> The subsection name for page-poison was typo'd as: >>>> >>>> vitio-balloon-device/page-poison >>>> >>>> Note the missing 'r' in virtio. >>>> >>>> When we have a machine type that enables page poison, and the guest >>>> enables it (which needs a new kernel), things fail rather unpredictably. >>>> >>>> The fallout from this is that most of the other subsections fail to >>>> load, including things like the feature bits in the device, one >>>> possible fallout is that the physical addresses of the queues >>>> then get aligned differently and we fail with an error about >>>> last_avail_idx being wrong. >>>> It's not obvious to me why this doesn't produce a more obvious failure, >>>> but virtio's vmstate loading is a bit open-coded. >>>> >>>> Fixes: 7483cbbaf82 ("virtio-balloon: Implement support for page poison reporting feature") >>>> bz: https://bugzilla.redhat.com/show_bug.cgi?id=1984401 >>>> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> >>>> --- >>>> hw/virtio/virtio-balloon.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c >>>> index 5a69dce35d..c6962fcbfe 100644 >>>> --- a/hw/virtio/virtio-balloon.c >>>> +++ b/hw/virtio/virtio-balloon.c >>>> @@ -852,7 +852,7 @@ static const VMStateDescription vmstate_virtio_balloon_free_page_hint = { >>>> }; >>>> static const VMStateDescription vmstate_virtio_balloon_page_poison = { >>>> - .name = "vitio-balloon-device/page-poison", >>>> + .name = "virtio-balloon-device/page-poison", >>>> .version_id = 1, >>>> .minimum_version_id = 1, >>>> .needed = virtio_balloon_page_poison_support, >>>> >>> >>> Oh, that's very subtle. I wasn't even aware that the prefix really has to >>> match the actual device ... I thought the whole idea of the prefix here was >>> just to make the string unique ... >> >> Subsection naming is *very* critical; the logic is something like: >> 'we're loading the X device' >> a subsection arrives for 'N/P' >> if 'X==N' then it looks in X for subsection P. >> If 'X!=N' then it assumes we've finished loading X >> and P is really for an outer device that X is part of. >> This is horrible. > > Is there value in making this more explicit via a code convention > for .name field initializers. eg instead of > > .name = "virtio-balloon-device/page-poison", > > Prefer > > .name = TYPE_VIRTIO_BALLOON "/page-poison" > > ? IIUC so far only user-creatable devices are required to have a stable name in the TYPE definition (because CLI must stay stable). Which is why this type is not recommended for migration section names.
Note this also correspounds to: https://gitlab.com/qemu-project/qemu/-/issues/485
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index 5a69dce35d..c6962fcbfe 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -852,7 +852,7 @@ static const VMStateDescription vmstate_virtio_balloon_free_page_hint = { }; static const VMStateDescription vmstate_virtio_balloon_page_poison = { - .name = "vitio-balloon-device/page-poison", + .name = "virtio-balloon-device/page-poison", .version_id = 1, .minimum_version_id = 1, .needed = virtio_balloon_page_poison_support,