mbox series

[0/2] KVM: arm/arm64: Add VCPU workarounds firmware register

Message ID 20190107120537.184252-1-andre.przywara@arm.com (mailing list archive)
Headers show
Series KVM: arm/arm64: Add VCPU workarounds firmware register | expand

Message

Andre Przywara Jan. 7, 2019, 12:05 p.m. UTC
Workarounds for Spectre variant 2 or 4 vulnerabilities require some help
from the firmware, so KVM implements an interface to provide that for
guests. When such a guest is migrated, we want to make sure we don't
loose the protection the guest relies on.

This introduces two new firmware registers in KVM's GET/SET_ONE_REG
interface, so userland can save the level of protection implemented by
the hypervisor and used by the guest. Upon restoring these registers,
we make sure we don't downgrade and reject any values that would mean
weaker protection.
There is some table in the code to describe the valid combinations.

Patch 1 implements the two firmware registers, patch 2 adds the
documentation.

This solution is using two hardcoded firmware registers for that. Not
sure if we should introduce something based on SMCCC instead, which
would allow us to report implementation of any SMCCC based service in a
generic way, or if this would be too generic.

ARM(32) is a bit of a pain (again), as the firmware register interface
is shared, but 32-bit does not implement all the workarounds.
For now I stuffed two wrappers into kvm_emulate.h, which doesn't sound
like the best solution. Happy to hear about better ideas.

This has been tested with a hack to allow faking the protection level
via a debugfs knob, then saving/restoring via some userland tool calling
the GET_ONE_REG/SET_ONE_REG ioctls.

Please have a look and comment!

Cheers,
Andre

Andre Przywara (2):
  KVM: arm/arm64: Add save/restore support for firmware workaround state
  KVM: doc: add API documentation on the KVM_REG_ARM_WORKAROUNDS
    register

 Documentation/virtual/kvm/arm/psci.txt |  20 ++++
 arch/arm/include/asm/kvm_emulate.h     |  10 ++
 arch/arm/include/uapi/asm/kvm.h        |   9 ++
 arch/arm64/include/asm/kvm_emulate.h   |  14 +++
 arch/arm64/include/uapi/asm/kvm.h      |   9 ++
 virt/kvm/arm/psci.c                    | 138 ++++++++++++++++++++++++-
 6 files changed, 198 insertions(+), 2 deletions(-)

Comments

Dave Martin Jan. 22, 2019, 10:17 a.m. UTC | #1
On Mon, Jan 07, 2019 at 12:05:35PM +0000, Andre Przywara wrote:
> Workarounds for Spectre variant 2 or 4 vulnerabilities require some help
> from the firmware, so KVM implements an interface to provide that for
> guests. When such a guest is migrated, we want to make sure we don't
> loose the protection the guest relies on.
> 
> This introduces two new firmware registers in KVM's GET/SET_ONE_REG
> interface, so userland can save the level of protection implemented by
> the hypervisor and used by the guest. Upon restoring these registers,
> we make sure we don't downgrade and reject any values that would mean
> weaker protection.

Just trolling here, but could we treat these as immutable, like the ID
registers?  

We don't support migration between nodes that are "too different" in any
case, so I wonder if adding complex logic to compare vulnerabilities and
workarounds is liable to create more problems than it solves...

Do we know of anyone who explicitly needs this flexibility yet?

[...]

Cheers
---Dave
Andre Przywara Jan. 22, 2019, 10:41 a.m. UTC | #2
On Tue, 22 Jan 2019 10:17:00 +0000
Dave Martin <Dave.Martin@arm.com> wrote:

> On Mon, Jan 07, 2019 at 12:05:35PM +0000, Andre Przywara wrote:
> > Workarounds for Spectre variant 2 or 4 vulnerabilities require some
> > help from the firmware, so KVM implements an interface to provide
> > that for guests. When such a guest is migrated, we want to make
> > sure we don't loose the protection the guest relies on.
> > 
> > This introduces two new firmware registers in KVM's GET/SET_ONE_REG
> > interface, so userland can save the level of protection implemented
> > by the hypervisor and used by the guest. Upon restoring these
> > registers, we make sure we don't downgrade and reject any values
> > that would mean weaker protection.  
> 
> Just trolling here, but could we treat these as immutable, like the ID
> registers?  
> 
> We don't support migration between nodes that are "too different" in
> any case, so I wonder if adding complex logic to compare
> vulnerabilities and workarounds is liable to create more problems
> than it solves...

That is a good point, and we should keep an eye on that it doesn't get
out of hands here. Indeed it is not clear yet how many users really
want to migrate between hosts with a different CPU or platform.
But ...
 
> Do we know of anyone who explicitly needs this flexibility yet?

I think there is a good use case to migrate from a vulnerable host
to one which implements mitigations or isn't vulnerable in the first
place, in which case we want to allow migrations. The scenario here
would probably to migrate VMs away, update the firmware, reboot
the host and migrate the VMs back.

For the other direction (increasing vulnerability) we deny it here,
which is in line with what you think of?

Cheers,
Andre.
Marc Zyngier Jan. 22, 2019, 11:11 a.m. UTC | #3
On Tue, 22 Jan 2019 10:17:00 +0000,
Dave Martin <Dave.Martin@arm.com> wrote:
> 
> On Mon, Jan 07, 2019 at 12:05:35PM +0000, Andre Przywara wrote:
> > Workarounds for Spectre variant 2 or 4 vulnerabilities require some help
> > from the firmware, so KVM implements an interface to provide that for
> > guests. When such a guest is migrated, we want to make sure we don't
> > loose the protection the guest relies on.
> > 
> > This introduces two new firmware registers in KVM's GET/SET_ONE_REG
> > interface, so userland can save the level of protection implemented by
> > the hypervisor and used by the guest. Upon restoring these registers,
> > we make sure we don't downgrade and reject any values that would mean
> > weaker protection.
> 
> Just trolling here, but could we treat these as immutable, like the ID
> registers?  
> 
> We don't support migration between nodes that are "too different" in any
> case, so I wonder if adding complex logic to compare vulnerabilities and
> workarounds is liable to create more problems than it solves...

And that's exactly the case we're trying to avoid. Two instances of
the same HW. One with firmware mitigations, one without. Migrating in
one direction is perfectly safe, migrating in the other isn't.

It is not about migrating to different HW at all.

	M.
Dave Martin Jan. 22, 2019, 1:56 p.m. UTC | #4
On Tue, Jan 22, 2019 at 11:11:09AM +0000, Marc Zyngier wrote:
> On Tue, 22 Jan 2019 10:17:00 +0000,
> Dave Martin <Dave.Martin@arm.com> wrote:
> > 
> > On Mon, Jan 07, 2019 at 12:05:35PM +0000, Andre Przywara wrote:
> > > Workarounds for Spectre variant 2 or 4 vulnerabilities require some help
> > > from the firmware, so KVM implements an interface to provide that for
> > > guests. When such a guest is migrated, we want to make sure we don't
> > > loose the protection the guest relies on.
> > > 
> > > This introduces two new firmware registers in KVM's GET/SET_ONE_REG
> > > interface, so userland can save the level of protection implemented by
> > > the hypervisor and used by the guest. Upon restoring these registers,
> > > we make sure we don't downgrade and reject any values that would mean
> > > weaker protection.
> > 
> > Just trolling here, but could we treat these as immutable, like the ID
> > registers?  
> > 
> > We don't support migration between nodes that are "too different" in any
> > case, so I wonder if adding complex logic to compare vulnerabilities and
> > workarounds is liable to create more problems than it solves...
> 
> And that's exactly the case we're trying to avoid. Two instances of
> the same HW. One with firmware mitigations, one without. Migrating in
> one direction is perfectly safe, migrating in the other isn't.
> 
> It is not about migrating to different HW at all.

So this is a realistic scenario when deploying a firmware update across
a cluter that has homogeneous hardware -- there will temporarly be
different firmware versions running on different nodes?

My concern is really "will the checking be too buggy / untested in
practice to be justified by the use case".

I'll take a closer look at the checking logic.

Cheers
---Dave
Marc Zyngier Jan. 22, 2019, 2:51 p.m. UTC | #5
On Tue, 22 Jan 2019 13:56:34 +0000,
Dave Martin <Dave.Martin@arm.com> wrote:
> 
> On Tue, Jan 22, 2019 at 11:11:09AM +0000, Marc Zyngier wrote:
> > On Tue, 22 Jan 2019 10:17:00 +0000,
> > Dave Martin <Dave.Martin@arm.com> wrote:
> > > 
> > > On Mon, Jan 07, 2019 at 12:05:35PM +0000, Andre Przywara wrote:
> > > > Workarounds for Spectre variant 2 or 4 vulnerabilities require some help
> > > > from the firmware, so KVM implements an interface to provide that for
> > > > guests. When such a guest is migrated, we want to make sure we don't
> > > > loose the protection the guest relies on.
> > > > 
> > > > This introduces two new firmware registers in KVM's GET/SET_ONE_REG
> > > > interface, so userland can save the level of protection implemented by
> > > > the hypervisor and used by the guest. Upon restoring these registers,
> > > > we make sure we don't downgrade and reject any values that would mean
> > > > weaker protection.
> > > 
> > > Just trolling here, but could we treat these as immutable, like the ID
> > > registers?  
> > > 
> > > We don't support migration between nodes that are "too different" in any
> > > case, so I wonder if adding complex logic to compare vulnerabilities and
> > > workarounds is liable to create more problems than it solves...
> > 
> > And that's exactly the case we're trying to avoid. Two instances of
> > the same HW. One with firmware mitigations, one without. Migrating in
> > one direction is perfectly safe, migrating in the other isn't.
> > 
> > It is not about migrating to different HW at all.
> 
> So this is a realistic scenario when deploying a firmware update across
> a cluter that has homogeneous hardware -- there will temporarly be
> different firmware versions running on different nodes?

Case in point: I have on my desk two AMD Seattle systems. One with an
ancient firmware that doesn't mitigate anything, and one that has all
the mitigations applied (and correctly advertised). I can migrate
stuff back and forth, and that's really bad.

What people do in their data centre is none of my business,
really. What concerns me is that there is a potential for something
bad to happen without people noticing. And it is KVM's job to do the
right thing in this case.

> My concern is really "will the checking be too buggy / untested in
> practice to be justified by the use case".

Not doing anything is not going to make the current situation "less
buggy". We have all the stuff we need to test this. We can even
artificially create the various scenarios on a model.

> I'll take a closer look at the checking logic.

Thanks,

	M.
Dave Martin Jan. 22, 2019, 3:28 p.m. UTC | #6
On Tue, Jan 22, 2019 at 02:51:11PM +0000, Marc Zyngier wrote:
> On Tue, 22 Jan 2019 13:56:34 +0000,
> Dave Martin <Dave.Martin@arm.com> wrote:
> > 
> > On Tue, Jan 22, 2019 at 11:11:09AM +0000, Marc Zyngier wrote:
> > > On Tue, 22 Jan 2019 10:17:00 +0000,
> > > Dave Martin <Dave.Martin@arm.com> wrote:
> > > > 
> > > > On Mon, Jan 07, 2019 at 12:05:35PM +0000, Andre Przywara wrote:
> > > > > Workarounds for Spectre variant 2 or 4 vulnerabilities require some help
> > > > > from the firmware, so KVM implements an interface to provide that for
> > > > > guests. When such a guest is migrated, we want to make sure we don't
> > > > > loose the protection the guest relies on.
> > > > > 
> > > > > This introduces two new firmware registers in KVM's GET/SET_ONE_REG
> > > > > interface, so userland can save the level of protection implemented by
> > > > > the hypervisor and used by the guest. Upon restoring these registers,
> > > > > we make sure we don't downgrade and reject any values that would mean
> > > > > weaker protection.
> > > > 
> > > > Just trolling here, but could we treat these as immutable, like the ID
> > > > registers?  
> > > > 
> > > > We don't support migration between nodes that are "too different" in any
> > > > case, so I wonder if adding complex logic to compare vulnerabilities and
> > > > workarounds is liable to create more problems than it solves...
> > > 
> > > And that's exactly the case we're trying to avoid. Two instances of
> > > the same HW. One with firmware mitigations, one without. Migrating in
> > > one direction is perfectly safe, migrating in the other isn't.
> > > 
> > > It is not about migrating to different HW at all.
> > 
> > So this is a realistic scenario when deploying a firmware update across
> > a cluter that has homogeneous hardware -- there will temporarly be
> > different firmware versions running on different nodes?
> 
> Case in point: I have on my desk two AMD Seattle systems. One with an
> ancient firmware that doesn't mitigate anything, and one that has all
> the mitigations applied (and correctly advertised). I can migrate
> stuff back and forth, and that's really bad.

Agreed.

> What people do in their data centre is none of my business,
> really. What concerns me is that there is a potential for something
> bad to happen without people noticing. And it is KVM's job to do the
> right thing in this case.

Fair enough.

> > My concern is really "will the checking be too buggy / untested in
> > practice to be justified by the use case".
> 
> Not doing anything is not going to make the current situation "less
> buggy". We have all the stuff we need to test this. We can even
> artificially create the various scenarios on a model.

Agreed.  My concern is about how this will scale if future
vulnerabilities are added to the mix.  We might ultimately end up in a
worse mess, but I may be being paranoid.

> > I'll take a closer look at the checking logic.

See the other thread.  I have an idea there for exposing the information
in a different way that may simplfy things (or be totally misguided...)

Cheers
---Dave