mbox series

[0/2] vmgenid: add generation counter

Message ID 20220803134147.31073-1-bchalios@amazon.es (mailing list archive)
Headers show
Series vmgenid: add generation counter | expand

Message

Babis Chalios Aug. 3, 2022, 1:41 p.m. UTC
From: Babis Chalios <bchalios@amazon.es>

VM generation ID exposes a GUID inside the VM which changes every time a
VM restore is happening. Typically, this GUID is used by the guest
kernel to re-seed its internal PRNG. As a result, this value cannot be
exposed in guest user-space as a notification mechanism for VM restore
events.

This patch set extends vmgenid to introduce a 32 bits generation counter
whose purpose is to be used as a VM restore notification mechanism for
the guest user-space.

It is true that such a counter could be implemented entirely by the
guest kernel, but this would rely on the vmgenid ACPI notification to
trigger the counter update, which is inherently racy. Exposing this
through the monitor allows the updated value to be in-place before
resuming the vcpus, so interested user-space code can (atomically)
observe the update without relying on the ACPI notification.

Babis Chalios (2):
  vmgenid: make device data size configurable
  vmgenid: add generation counter

 docs/specs/vmgenid.txt    | 101 ++++++++++++++++++--------
 hw/acpi/vmgenid.c         | 145 +++++++++++++++++++++++++++++++-------
 include/hw/acpi/vmgenid.h |  23 ++++--
 3 files changed, 204 insertions(+), 65 deletions(-)

Comments

Michael S. Tsirkin Aug. 3, 2022, 3:36 p.m. UTC | #1
On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchalios@amazon.es wrote:
> From: Babis Chalios <bchalios@amazon.es>
> 
> VM generation ID exposes a GUID inside the VM which changes every time a
> VM restore is happening. Typically, this GUID is used by the guest
> kernel to re-seed its internal PRNG. As a result, this value cannot be
> exposed in guest user-space as a notification mechanism for VM restore
> events.
> 
> This patch set extends vmgenid to introduce a 32 bits generation counter
> whose purpose is to be used as a VM restore notification mechanism for
> the guest user-space.
> 
> It is true that such a counter could be implemented entirely by the
> guest kernel, but this would rely on the vmgenid ACPI notification to
> trigger the counter update, which is inherently racy. Exposing this
> through the monitor allows the updated value to be in-place before
> resuming the vcpus, so interested user-space code can (atomically)
> observe the update without relying on the ACPI notification.

Producing another 4 bytes is not really the issue, the issue
is how does guest consume this.
So I would like this discussion to happen on the linux kernel mailing
list not just here.  Can you post the linux patch please?




> Babis Chalios (2):
>   vmgenid: make device data size configurable
>   vmgenid: add generation counter
> 
>  docs/specs/vmgenid.txt    | 101 ++++++++++++++++++--------
>  hw/acpi/vmgenid.c         | 145 +++++++++++++++++++++++++++++++-------
>  include/hw/acpi/vmgenid.h |  23 ++++--
>  3 files changed, 204 insertions(+), 65 deletions(-)
> 
> -- 
> 2.37.1
> 
> Amazon Spain Services sociedad limitada unipersonal, Calle Ramirez de Prado 5, 28045 Madrid. Registro Mercantil de Madrid . Tomo 22458 . Folio 102 . Hoja M-401234 . CIF B84570936
Babis Chalios Aug. 3, 2022, 4:17 p.m. UTC | #2
On 8/3/22 5:36 PM, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchalios@amazon.es wrote:
> > From: Babis Chalios <bchalios@amazon.es>
> >
> > VM generation ID exposes a GUID inside the VM which changes every time a
> > VM restore is happening. Typically, this GUID is used by the guest
> > kernel to re-seed its internal PRNG. As a result, this value cannot be
> > exposed in guest user-space as a notification mechanism for VM restore
> > events.
> >
> > This patch set extends vmgenid to introduce a 32 bits generation counter
> > whose purpose is to be used as a VM restore notification mechanism for
> > the guest user-space.
> >
> > It is true that such a counter could be implemented entirely by the
> > guest kernel, but this would rely on the vmgenid ACPI notification to
> > trigger the counter update, which is inherently racy. Exposing this
> > through the monitor allows the updated value to be in-place before
> > resuming the vcpus, so interested user-space code can (atomically)
> > observe the update without relying on the ACPI notification.
> 
> Producing another 4 bytes is not really the issue, the issue
> is how does guest consume this.
> So I would like this discussion to happen on the linux kernel mailing
> list not just here.  Can you post the linux patch please?
> 

CCed you in the Linux patch thread.

> 
> 
> 
> > Babis Chalios (2):
> >    vmgenid: make device data size configurable
> >    vmgenid: add generation counter
> >
> >   docs/specs/vmgenid.txt    | 101 ++++++++++++++++++--------
> >   hw/acpi/vmgenid.c         | 145 +++++++++++++++++++++++++++++++-------
> >   include/hw/acpi/vmgenid.h |  23 ++++--
> >   3 files changed, 204 insertions(+), 65 deletions(-)
> >
> > --
> > 2.37.1
> >
> > Amazon Spain Services sociedad limitada unipersonal, Calle Ramirez de Prado 5, 28045 Madrid. Registro Mercantil de Madrid . Tomo 22458 . Folio 102 . Hoja M-401234 . CIF B84570936
> 
> 
Amazon Spain Services sociedad limitada unipersonal, Calle Ramirez de Prado 5, 28045 Madrid. Registro Mercantil de Madrid . Tomo 22458 . Folio 102 . Hoja M-401234 . CIF B84570936
Daniel P. Berrangé Aug. 3, 2022, 4:26 p.m. UTC | #3
On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchalios@amazon.es wrote:
> From: Babis Chalios <bchalios@amazon.es>
> 
> VM generation ID exposes a GUID inside the VM which changes every time a
> VM restore is happening. Typically, this GUID is used by the guest
> kernel to re-seed its internal PRNG. As a result, this value cannot be
> exposed in guest user-space as a notification mechanism for VM restore
> events.
> 
> This patch set extends vmgenid to introduce a 32 bits generation counter
> whose purpose is to be used as a VM restore notification mechanism for
> the guest user-space.
> 
> It is true that such a counter could be implemented entirely by the
> guest kernel, but this would rely on the vmgenid ACPI notification to
> trigger the counter update, which is inherently racy. Exposing this
> through the monitor allows the updated value to be in-place before
> resuming the vcpus, so interested user-space code can (atomically)
> observe the update without relying on the ACPI notification.

The VM generation ID feature in QEMU is implementing a spec defined
by Microsoft. It is implemented in HyperV, VMWare, QEMU and possibly
more. This series is proposing a QEMU specific variant, which means
Linux running on all these other hypervisor platforms won't benefit
from the change. If the counter were provided entirely in the guest
kernel, then it works across all hypervisors.

It feels like the kernel ought to provide an implementation itself
as a starting point, with this QEMU change merely being an optional
enhancement to close the race window.

Ideally there would be someone at Microsoft we could connect with to
propose they include this feature in a VM Gen ID spec update, but I
don't personally know who to contact about that kind of thing. A
spec update would increase chances that this change gets provieded
across all hypervisors.

With regards,
Daniel
Babis Chalios Aug. 4, 2022, 9:54 a.m. UTC | #4
Hi Daniel,

On 3/8/22 18:26, Daniel P. Berrangé wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchalios@amazon.es wrote:
>> From: Babis Chalios <bchalios@amazon.es>
>>
>> VM generation ID exposes a GUID inside the VM which changes every time a
>> VM restore is happening. Typically, this GUID is used by the guest
>> kernel to re-seed its internal PRNG. As a result, this value cannot be
>> exposed in guest user-space as a notification mechanism for VM restore
>> events.
>>
>> This patch set extends vmgenid to introduce a 32 bits generation counter
>> whose purpose is to be used as a VM restore notification mechanism for
>> the guest user-space.
>>
>> It is true that such a counter could be implemented entirely by the
>> guest kernel, but this would rely on the vmgenid ACPI notification to
>> trigger the counter update, which is inherently racy. Exposing this
>> through the monitor allows the updated value to be in-place before
>> resuming the vcpus, so interested user-space code can (atomically)
>> observe the update without relying on the ACPI notification.
> The VM generation ID feature in QEMU is implementing a spec defined
> by Microsoft. It is implemented in HyperV, VMWare, QEMU and possibly
> more. This series is proposing a QEMU specific variant, which means
> Linux running on all these other hypervisor platforms won't benefit
> from the change. If the counter were provided entirely in the guest
> kernel, then it works across all hypervisors.
>
> It feels like the kernel ought to provide an implementation itself
> as a starting point, with this QEMU change merely being an optional
> enhancement to close the race window.
>
> Ideally there would be someone at Microsoft we could connect with to
> propose they include this feature in a VM Gen ID spec update, but I
> don't personally know who to contact about that kind of thing. A
> spec update would increase chances that this change gets provieded
> across all hypervisors.

You are right, this *is* out-of-spec. The approach here is based on various
discussions happened last year when we first tried to upstream and more
recently when vmgenid landed in Linux. I find that this summary:
https://lkml.org/lkml/2022/3/1/693 quite to the point. (CCing Jason to
have his take on the matter).

This series comes together with a Linux counterpart:
https://lkml.org/lkml/2022/8/3/563, where the generation counter is
exposed to user-space as a misc device. There, I tried to make the
generation counter "optional", in the sense that if it is not there, the
ACPI device should not fail, exactly because, for the moment, this is
not in the spec and hypervisors might not want to implement it.

However, I think that changing the spec will take time and this is a
real issue affecting real use-cases, so we should start from somewhere.

Cheers,
Babis


>
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>

Amazon Spain Services sociedad limitada unipersonal, Calle Ramirez de Prado 5, 28045 Madrid. Registro Mercantil de Madrid . Tomo 22458 . Folio 102 . Hoja M-401234 . CIF B84570936
Daniel P. Berrangé Aug. 4, 2022, 10:02 a.m. UTC | #5
On Thu, Aug 04, 2022 at 11:54:05AM +0200, Chalios, Babis wrote:
> Hi Daniel,
> 
> On 3/8/22 18:26, Daniel P. Berrangé wrote:
> > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > 
> > 
> > 
> > On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchalios@amazon.es wrote:
> > > From: Babis Chalios <bchalios@amazon.es>
> > > 
> > > VM generation ID exposes a GUID inside the VM which changes every time a
> > > VM restore is happening. Typically, this GUID is used by the guest
> > > kernel to re-seed its internal PRNG. As a result, this value cannot be
> > > exposed in guest user-space as a notification mechanism for VM restore
> > > events.
> > > 
> > > This patch set extends vmgenid to introduce a 32 bits generation counter
> > > whose purpose is to be used as a VM restore notification mechanism for
> > > the guest user-space.
> > > 
> > > It is true that such a counter could be implemented entirely by the
> > > guest kernel, but this would rely on the vmgenid ACPI notification to
> > > trigger the counter update, which is inherently racy. Exposing this
> > > through the monitor allows the updated value to be in-place before
> > > resuming the vcpus, so interested user-space code can (atomically)
> > > observe the update without relying on the ACPI notification.
> > The VM generation ID feature in QEMU is implementing a spec defined
> > by Microsoft. It is implemented in HyperV, VMWare, QEMU and possibly
> > more. This series is proposing a QEMU specific variant, which means
> > Linux running on all these other hypervisor platforms won't benefit
> > from the change. If the counter were provided entirely in the guest
> > kernel, then it works across all hypervisors.
> > 
> > It feels like the kernel ought to provide an implementation itself
> > as a starting point, with this QEMU change merely being an optional
> > enhancement to close the race window.
> > 
> > Ideally there would be someone at Microsoft we could connect with to
> > propose they include this feature in a VM Gen ID spec update, but I
> > don't personally know who to contact about that kind of thing. A
> > spec update would increase chances that this change gets provieded
> > across all hypervisors.
> 
> You are right, this *is* out-of-spec. The approach here is based on various
> discussions happened last year when we first tried to upstream and more
> recently when vmgenid landed in Linux. I find that this summary:
> https://lkml.org/lkml/2022/3/1/693 quite to the point. (CCing Jason to
> have his take on the matter).
> 
> This series comes together with a Linux counterpart:
> https://lkml.org/lkml/2022/8/3/563, where the generation counter is
> exposed to user-space as a misc device. There, I tried to make the
> generation counter "optional", in the sense that if it is not there, the
> ACPI device should not fail, exactly because, for the moment, this is
> not in the spec and hypervisors might not want to implement it.
> 
> However, I think that changing the spec will take time and this is a
> real issue affecting real use-cases, so we should start from somewhere.

I know a spec change can take time, but has there even been any effort
at all to try to start that action since first discussed a year ago ?

If these race condition issues are supposedly so serious that we need
to do this without waiting for a spec, then what is the answer for the
masses of users running Linux on VMware or HyperV/Azure ?

With regards,
Daniel
Babis Chalios Aug. 4, 2022, 10:17 a.m. UTC | #6
On 4/8/22 12:02, Daniel P. Berrangé wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Thu, Aug 04, 2022 at 11:54:05AM +0200, Chalios, Babis wrote:
>> Hi Daniel,
>>
>> On 3/8/22 18:26, Daniel P. Berrangé wrote:
>>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>>
>>>
>>>
>>> On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchalios@amazon.es wrote:
>>>> From: Babis Chalios <bchalios@amazon.es>
>>>>
>>>> VM generation ID exposes a GUID inside the VM which changes every time a
>>>> VM restore is happening. Typically, this GUID is used by the guest
>>>> kernel to re-seed its internal PRNG. As a result, this value cannot be
>>>> exposed in guest user-space as a notification mechanism for VM restore
>>>> events.
>>>>
>>>> This patch set extends vmgenid to introduce a 32 bits generation counter
>>>> whose purpose is to be used as a VM restore notification mechanism for
>>>> the guest user-space.
>>>>
>>>> It is true that such a counter could be implemented entirely by the
>>>> guest kernel, but this would rely on the vmgenid ACPI notification to
>>>> trigger the counter update, which is inherently racy. Exposing this
>>>> through the monitor allows the updated value to be in-place before
>>>> resuming the vcpus, so interested user-space code can (atomically)
>>>> observe the update without relying on the ACPI notification.
>>> The VM generation ID feature in QEMU is implementing a spec defined
>>> by Microsoft. It is implemented in HyperV, VMWare, QEMU and possibly
>>> more. This series is proposing a QEMU specific variant, which means
>>> Linux running on all these other hypervisor platforms won't benefit
>>> from the change. If the counter were provided entirely in the guest
>>> kernel, then it works across all hypervisors.
>>>
>>> It feels like the kernel ought to provide an implementation itself
>>> as a starting point, with this QEMU change merely being an optional
>>> enhancement to close the race window.
>>>
>>> Ideally there would be someone at Microsoft we could connect with to
>>> propose they include this feature in a VM Gen ID spec update, but I
>>> don't personally know who to contact about that kind of thing. A
>>> spec update would increase chances that this change gets provieded
>>> across all hypervisors.
>> You are right, this *is* out-of-spec. The approach here is based on various
>> discussions happened last year when we first tried to upstream and more
>> recently when vmgenid landed in Linux. I find that this summary:
>> https://lkml.org/lkml/2022/3/1/693 quite to the point. (CCing Jason to
>> have his take on the matter).
>>
>> This series comes together with a Linux counterpart:
>> https://lkml.org/lkml/2022/8/3/563, where the generation counter is
>> exposed to user-space as a misc device. There, I tried to make the
>> generation counter "optional", in the sense that if it is not there, the
>> ACPI device should not fail, exactly because, for the moment, this is
>> not in the spec and hypervisors might not want to implement it.
>>
>> However, I think that changing the spec will take time and this is a
>> real issue affecting real use-cases, so we should start from somewhere.
> I know a spec change can take time, but has there even been any effort
> at all to try to start that action since first discussed a year ago ?

These patch-sets are out exactly for starting the conversation on adding
this to the spec. As you mentioned, it would be great if we could get the
opinion of someone at Microsoft on this.

>
> If these race condition issues are supposedly so serious that we need
> to do this without waiting for a spec, then what is the answer for the
> masses of users running Linux on VMware or HyperV/Azure ?

The problem arises when you start snapshotting and restoring on VMs,
so not everyone is affected from the issue. Use-cases interested in this
are ones that manage fleets of VMs that run code that relies on
user/kernel-space PRNGs or network-facing services using UUIDs, for
example.

>
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>

Cheers,
Babis
Amazon Spain Services sociedad limitada unipersonal, Calle Ramirez de Prado 5, 28045 Madrid. Registro Mercantil de Madrid . Tomo 22458 . Folio 102 . Hoja M-401234 . CIF B84570936
Babis Chalios Aug. 4, 2022, 1:31 p.m. UTC | #7
On 4/8/22 12:17, Chalios, Babis wrote:
>
>
> On 4/8/22 12:02, Daniel P. Berrangé wrote:
>> CAUTION: This email originated from outside of the organization. Do 
>> not click links or open attachments unless you can confirm the sender 
>> and know the content is safe.
>>
>>
>>
>> On Thu, Aug 04, 2022 at 11:54:05AM +0200, Chalios, Babis wrote:
>>> Hi Daniel,
>>>
>>> On 3/8/22 18:26, Daniel P. Berrangé wrote:
>>>> CAUTION: This email originated from outside of the organization. Do 
>>>> not click links or open attachments unless you can confirm the 
>>>> sender and know the content is safe.
>>>>
>>>>
>>>>
>>>> On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchalios@amazon.es wrote:
>>>>> From: Babis Chalios <bchalios@amazon.es>
>>>>>
>>>>> VM generation ID exposes a GUID inside the VM which changes every 
>>>>> time a
>>>>> VM restore is happening. Typically, this GUID is used by the guest
>>>>> kernel to re-seed its internal PRNG. As a result, this value 
>>>>> cannot be
>>>>> exposed in guest user-space as a notification mechanism for VM 
>>>>> restore
>>>>> events.
>>>>>
>>>>> This patch set extends vmgenid to introduce a 32 bits generation 
>>>>> counter
>>>>> whose purpose is to be used as a VM restore notification mechanism 
>>>>> for
>>>>> the guest user-space.
>>>>>
>>>>> It is true that such a counter could be implemented entirely by the
>>>>> guest kernel, but this would rely on the vmgenid ACPI notification to
>>>>> trigger the counter update, which is inherently racy. Exposing this
>>>>> through the monitor allows the updated value to be in-place before
>>>>> resuming the vcpus, so interested user-space code can (atomically)
>>>>> observe the update without relying on the ACPI notification.
>>>> The VM generation ID feature in QEMU is implementing a spec defined
>>>> by Microsoft. It is implemented in HyperV, VMWare, QEMU and possibly
>>>> more. This series is proposing a QEMU specific variant, which means
>>>> Linux running on all these other hypervisor platforms won't benefit
>>>> from the change. If the counter were provided entirely in the guest
>>>> kernel, then it works across all hypervisors.
>>>>
>>>> It feels like the kernel ought to provide an implementation itself
>>>> as a starting point, with this QEMU change merely being an optional
>>>> enhancement to close the race window.
>>>>
>>>> Ideally there would be someone at Microsoft we could connect with to
>>>> propose they include this feature in a VM Gen ID spec update, but I
>>>> don't personally know who to contact about that kind of thing. A
>>>> spec update would increase chances that this change gets provieded
>>>> across all hypervisors.
>>> You are right, this *is* out-of-spec. The approach here is based on 
>>> various
>>> discussions happened last year when we first tried to upstream and more
>>> recently when vmgenid landed in Linux. I find that this summary:
>>> https://lkml.org/lkml/2022/3/1/693 quite to the point. (CCing Jason to
>>> have his take on the matter).
>>>
>>> This series comes together with a Linux counterpart:
>>> https://lkml.org/lkml/2022/8/3/563, where the generation counter is
>>> exposed to user-space as a misc device. There, I tried to make the
>>> generation counter "optional", in the sense that if it is not there, 
>>> the
>>> ACPI device should not fail, exactly because, for the moment, this is
>>> not in the spec and hypervisors might not want to implement it.
>>>
>>> However, I think that changing the spec will take time and this is a
>>> real issue affecting real use-cases, so we should start from somewhere.
>> I know a spec change can take time, but has there even been any effort
>> at all to try to start that action since first discussed a year ago ?
>
> These patch-sets are out exactly for starting the conversation on adding
> this to the spec. As you mentioned, it would be great if we could get the
> opinion of someone at Microsoft on this.
>
>>
>> If these race condition issues are supposedly so serious that we need
>> to do this without waiting for a spec, then what is the answer for the
>> masses of users running Linux on VMware or HyperV/Azure ?
>
> The problem arises when you start snapshotting and restoring on VMs,
> so not everyone is affected from the issue. Use-cases interested in this
> are ones that manage fleets of VMs that run code that relies on
> user/kernel-space PRNGs or network-facing services using UUIDs, for
> example.
>
>>
>> With regards,
>> Daniel
>> -- 
>> |: https://berrange.com      -o- 
>> https://www.flickr.com/photos/dberrange :|
>> |: https://libvirt.org         -o- https://fstop138.berrange.com :|
>> |: https://entangle-photo.org    -o- 
>> https://www.instagram.com/dberrange :|
>>
>
> Cheers,
> Babis

I am CCing Michael from Microsoft. Maybe he has some input on this.
Amazon Spain Services sociedad limitada unipersonal, Calle Ramirez de Prado 5, 28045 Madrid. Registro Mercantil de Madrid . Tomo 22458 . Folio 102 . Hoja M-401234 . CIF B84570936
Jason A. Donenfeld Aug. 4, 2022, 3:01 p.m. UTC | #8
Hi Babis,

On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchalios@amazon.es wrote:
> From: Babis Chalios <bchalios@amazon.es>
> 
> VM generation ID exposes a GUID inside the VM which changes every time a
> VM restore is happening. Typically, this GUID is used by the guest
> kernel to re-seed its internal PRNG. As a result, this value cannot be
> exposed in guest user-space as a notification mechanism for VM restore
> events.
> 
> This patch set extends vmgenid to introduce a 32 bits generation counter
> whose purpose is to be used as a VM restore notification mechanism for
> the guest user-space.
> 
> It is true that such a counter could be implemented entirely by the
> guest kernel, but this would rely on the vmgenid ACPI notification to
> trigger the counter update, which is inherently racy. Exposing this
> through the monitor allows the updated value to be in-place before
> resuming the vcpus, so interested user-space code can (atomically)
> observe the update without relying on the ACPI notification.

As I wrote on LKML:
https://lore.kernel.org/lkml/Yuve4vuAnU85mdRY@zx2c4.com/
you seem to be rehashing something already discussed in earlier threads.
I don't think we should rush to adding something like this to QEMU.

Jason
Michael Kelley (LINUX) Aug. 7, 2022, 3:39 p.m. UTC | #9
From: Chalios, Babis <bchalios@amazon.es> Sent: Thursday, August 4, 2022 6:31 AM
> 
> On 4/8/22 12:17, Chalios, Babis wrote:
> >
> > On 4/8/22 12:02, Daniel P. Berrangé wrote:
> >>
> >> On Thu, Aug 04, 2022 at 11:54:05AM +0200, Chalios, Babis wrote:
> >>> Hi Daniel,
> >>>
> >>> On 3/8/22 18:26, Daniel P. Berrangé wrote:
> >>>>
> >>>> On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchalios@amazon.es wrote:
> >>>>> From: Babis Chalios <bchalios@amazon.es>
> >>>>>
> >>>>> VM generation ID exposes a GUID inside the VM which changes every
> >>>>> time a
> >>>>> VM restore is happening. Typically, this GUID is used by the guest
> >>>>> kernel to re-seed its internal PRNG. As a result, this value
> >>>>> cannot be
> >>>>> exposed in guest user-space as a notification mechanism for VM
> >>>>> restore
> >>>>> events.
> >>>>>
> >>>>> This patch set extends vmgenid to introduce a 32 bits generation
> >>>>> counter
> >>>>> whose purpose is to be used as a VM restore notification mechanism
> >>>>> for
> >>>>> the guest user-space.
> >>>>>
> >>>>> It is true that such a counter could be implemented entirely by the
> >>>>> guest kernel, but this would rely on the vmgenid ACPI notification to
> >>>>> trigger the counter update, which is inherently racy. Exposing this
> >>>>> through the monitor allows the updated value to be in-place before
> >>>>> resuming the vcpus, so interested user-space code can (atomically)
> >>>>> observe the update without relying on the ACPI notification.
> >>>> The VM generation ID feature in QEMU is implementing a spec defined
> >>>> by Microsoft. It is implemented in HyperV, VMWare, QEMU and possibly
> >>>> more. This series is proposing a QEMU specific variant, which means
> >>>> Linux running on all these other hypervisor platforms won't benefit
> >>>> from the change. If the counter were provided entirely in the guest
> >>>> kernel, then it works across all hypervisors.
> >>>>
> >>>> It feels like the kernel ought to provide an implementation itself
> >>>> as a starting point, with this QEMU change merely being an optional
> >>>> enhancement to close the race window.
> >>>>
> >>>> Ideally there would be someone at Microsoft we could connect with to
> >>>> propose they include this feature in a VM Gen ID spec update, but I
> >>>> don't personally know who to contact about that kind of thing. A
> >>>> spec update would increase chances that this change gets provieded
> >>>> across all hypervisors.
> >>> You are right, this *is* out-of-spec. The approach here is based on
> >>> various
> >>> discussions happened last year when we first tried to upstream and more
> >>> recently when vmgenid landed in Linux. I find that this summary:
> >>> https://lkml.org/lkml/2022/3/1/693 quite to the point. (CCing Jason to
> >>> have his take on the matter).
> >>>
> >>> This series comes together with a Linux counterpart:
> >>> https://lkml.org/lkml/2022/8/3/563, where the generation counter is
> >>> exposed to user-space as a misc device. There, I tried to make the
> >>> generation counter "optional", in the sense that if it is not there, the
> >>> ACPI device should not fail, exactly because, for the moment, this is
> >>> not in the spec and hypervisors might not want to implement it.
> >>>
> >>> However, I think that changing the spec will take time and this is a
> >>> real issue affecting real use-cases, so we should start from somewhere.
> >> I know a spec change can take time, but has there even been any effort
> >> at all to try to start that action since first discussed a year ago ?
> >
> > These patch-sets are out exactly for starting the conversation on adding
> > this to the spec. As you mentioned, it would be great if we could get the
> > opinion of someone at Microsoft on this.
> >
> >>
> >> If these race condition issues are supposedly so serious that we need
> >> to do this without waiting for a spec, then what is the answer for the
> >> masses of users running Linux on VMware or HyperV/Azure ?
> >
> > The problem arises when you start snapshotting and restoring on VMs,
> > so not everyone is affected from the issue. Use-cases interested in this
> > are ones that manage fleets of VMs that run code that relies on
> > user/kernel-space PRNGs or network-facing services using UUIDs, for
> > example.
> >
> 
> I am CCing Michael from Microsoft. Maybe he has some input on this.

FWIW, Microsoft created the vmgenid concept and spec 10 years ago for
Windows running virtualized, with virtualized Active Directory Domain
Controllers being the primary use case.  I doubt there's anyone on the
Windows side here at Microsoft who has any further interest in the topic.
I do know there haven't been any requests to the Hyper-V team to
further enhance the vmgenid functionality.

As such, I think the Linux community is free to extend the functionality 
without conflicting with something Microsoft is doing.  I don't personally
have domain knowledge on these technical issues, so I'm acting more as
a Linux advocate with the Hyper-V team rather than a domain expert from
Microsoft.  I can circulate the Linux proposals with the Hyper-V team and
to see if they have any objections.  If there is agreement on extensions to
the hypervisor/VMM interface, I can ask the Hyper-V team to implement
them in support of Linux guest requirements.

I can also try to update the original spec with the extensions so that the
spec is captured in one place, though the internal logistics of updating an
old document like the vmgenid spec can be surprisingly difficult.  At worst,
there might need to be a separate spec for the Linux-driven extensions
that is hosted elsewhere.

Michael