mbox series

[RFC,0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs

Message ID 20200922004024.3699923-1-vipinsh@google.com (mailing list archive)
Headers show
Series KVM: SVM: Cgroup support for SVM SEV ASIDs | expand

Message

Vipin Sharma Sept. 22, 2020, 12:40 a.m. UTC
Hello,

This patch series adds a new SEV controller for tracking and limiting
the usage of SEV ASIDs on the AMD SVM platform.

SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
but this resource is in very limited quantity on a host.

This limited quantity creates issues like SEV ASID starvation and
unoptimized scheduling in the cloud infrastructure.

SEV controller provides SEV ASID tracking and resource control
mechanisms.

Patch 1 - Overview, motivation, and implementation details of the SEV
          controller.
Patch 2 - Kernel documentation of the SEV controller for both
	  cgroup v1 and v2.

Thanks

Vipin Sharma (2):
  KVM: SVM: Create SEV cgroup controller.
  KVM: SVM: SEV cgroup controller documentation

 Documentation/admin-guide/cgroup-v1/sev.rst |  94 +++++
 Documentation/admin-guide/cgroup-v2.rst     |  56 ++-
 arch/x86/kvm/Makefile                       |   1 +
 arch/x86/kvm/svm/sev.c                      |  16 +-
 arch/x86/kvm/svm/sev_cgroup.c               | 414 ++++++++++++++++++++
 arch/x86/kvm/svm/sev_cgroup.h               |  40 ++
 include/linux/cgroup_subsys.h               |   3 +
 init/Kconfig                                |  14 +
 8 files changed, 634 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/admin-guide/cgroup-v1/sev.rst
 create mode 100644 arch/x86/kvm/svm/sev_cgroup.c
 create mode 100644 arch/x86/kvm/svm/sev_cgroup.h

Comments

Sean Christopherson Sept. 22, 2020, 1:48 a.m. UTC | #1
On Mon, Sep 21, 2020 at 05:40:22PM -0700, Vipin Sharma wrote:
> Hello,
> 
> This patch series adds a new SEV controller for tracking and limiting
> the usage of SEV ASIDs on the AMD SVM platform.
> 
> SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
> but this resource is in very limited quantity on a host.
> 
> This limited quantity creates issues like SEV ASID starvation and
> unoptimized scheduling in the cloud infrastructure.
> 
> SEV controller provides SEV ASID tracking and resource control
> mechanisms.

This should be genericized to not be SEV specific.  TDX has a similar
scarcity issue in the form of key IDs, which IIUC are analogous to SEV ASIDs
(gave myself a quick crash course on SEV ASIDs).  Functionally, I doubt it
would change anything, I think it'd just be a bunch of renaming.  The hardest
part would probably be figuring out a name :-).

Another idea would be to go even more generic and implement a KVM cgroup
that accounts the number of VMs of a particular type, e.g. legacy, SEV,
SEV-ES?, and TDX.  That has potential future problems though as it falls
apart if hardware every supports 1:MANY VMs:KEYS, or if there is a need to
account keys outside of KVM, e.g. if MKTME for non-KVM cases ever sees the
light of day.
Vipin Sharma Sept. 22, 2020, 9:14 p.m. UTC | #2
On Mon, Sep 21, 2020 at 06:48:38PM -0700, Sean Christopherson wrote:
> On Mon, Sep 21, 2020 at 05:40:22PM -0700, Vipin Sharma wrote:
> > Hello,
> > 
> > This patch series adds a new SEV controller for tracking and limiting
> > the usage of SEV ASIDs on the AMD SVM platform.
> > 
> > SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
> > but this resource is in very limited quantity on a host.
> > 
> > This limited quantity creates issues like SEV ASID starvation and
> > unoptimized scheduling in the cloud infrastructure.
> > 
> > SEV controller provides SEV ASID tracking and resource control
> > mechanisms.
> 
> This should be genericized to not be SEV specific.  TDX has a similar
> scarcity issue in the form of key IDs, which IIUC are analogous to SEV ASIDs
> (gave myself a quick crash course on SEV ASIDs).  Functionally, I doubt it
> would change anything, I think it'd just be a bunch of renaming.  The hardest
> part would probably be figuring out a name :-).
> 
> Another idea would be to go even more generic and implement a KVM cgroup
> that accounts the number of VMs of a particular type, e.g. legacy, SEV,
> SEV-ES?, and TDX.  That has potential future problems though as it falls
> apart if hardware every supports 1:MANY VMs:KEYS, or if there is a need to
> account keys outside of KVM, e.g. if MKTME for non-KVM cases ever sees the
> light of day.

I read about the TDX and its use of the KeyID for encrypting VMs. TDX
has two kinds of KeyIDs private and shared.

On AMD platform there are two types of ASIDs for encryption.
1. SEV ASID - Normal runtime guest memory encryption.
2. SEV-ES ASID - Extends SEV ASID by adding register state encryption with
		 integrity.

Both types of ASIDs have their own maximum value which is provisioned in
the firmware

So, we are talking about 4 different types of resources:
1. AMD SEV ASID (implemented in this patch as sev.* files in SEV cgroup)
2. AMD SEV-ES ASID (in future, adding files like sev_es.*)
3. Intel TDX private KeyID
4. Intel TDX shared KeyID

TDX private KeyID is similar to SEV and SEV-ES ASID. I think coming up
with the same name which can be used by both platforms will not be easy,
and extensible with the future enhancements. This will get even more
difficult if Arm also comes up with something similar but with different
nuances.

I like the idea of the KVM cgroup and when it is mounted it will have
different files based on the hardware platform.

1. KVM cgroup on AMD will have:
sev.max & sev.current.
sev_es.max & sev_es.current.

2. KVM cgroup mounted on Intel:
tdx_private_keys.max
tdx_shared_keys.max

The KVM cgroup can be used to have control files which are generic (no
use case in my mind right now) and hardware platform specific files
also.
Paolo Bonzini Sept. 23, 2020, 12:47 p.m. UTC | #3
On 22/09/20 03:48, Sean Christopherson wrote:
> This should be genericized to not be SEV specific.  TDX has a similar
> scarcity issue in the form of key IDs, which IIUC are analogous to SEV ASIDs
> (gave myself a quick crash course on SEV ASIDs).  Functionally, I doubt it
> would change anything, I think it'd just be a bunch of renaming.  The hardest
> part would probably be figuring out a name :-).
> 
> Another idea would be to go even more generic and implement a KVM cgroup
> that accounts the number of VMs of a particular type, e.g. legacy, SEV,
> SEV-ES?, and TDX.  That has potential future problems though as it falls
> apart if hardware every supports 1:MANY VMs:KEYS, or if there is a need to
> account keys outside of KVM, e.g. if MKTME for non-KVM cases ever sees the
> light of day.

Or also MANY:1 (we are thinking of having multiple VMs share the same
SEV ASID).

It might even be the same on s390 and PPC, in which case we probably
want to implement this in virt/kvm.  Paul, Janosch, do you think this
would make sense for you?  The original commit message is below.

Paolo

> On Mon, Sep 21, 2020 at 05:40:22PM -0700, Vipin Sharma wrote:
>> Hello,
>>
>> This patch series adds a new SEV controller for tracking and limiting
>> the usage of SEV ASIDs on the AMD SVM platform.
>>
>> SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
>> but this resource is in very limited quantity on a host.
>>
>> This limited quantity creates issues like SEV ASID starvation and
>> unoptimized scheduling in the cloud infrastructure.
>>
>> SEV controller provides SEV ASID tracking and resource control
>> mechanisms.
Tom Lendacky Sept. 24, 2020, 7:55 p.m. UTC | #4
On 9/24/20 2:21 PM, Sean Christopherson wrote:
> On Tue, Sep 22, 2020 at 02:14:04PM -0700, Vipin Sharma wrote:
>> On Mon, Sep 21, 2020 at 06:48:38PM -0700, Sean Christopherson wrote:
>>> On Mon, Sep 21, 2020 at 05:40:22PM -0700, Vipin Sharma wrote:
>>>> Hello,
>>>>
>>>> This patch series adds a new SEV controller for tracking and limiting
>>>> the usage of SEV ASIDs on the AMD SVM platform.
>>>>
>>>> SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
>>>> but this resource is in very limited quantity on a host.
>>>>
>>>> This limited quantity creates issues like SEV ASID starvation and
>>>> unoptimized scheduling in the cloud infrastructure.
>>>>
>>>> SEV controller provides SEV ASID tracking and resource control
>>>> mechanisms.
>>>
>>> This should be genericized to not be SEV specific.  TDX has a similar
>>> scarcity issue in the form of key IDs, which IIUC are analogous to SEV ASIDs
>>> (gave myself a quick crash course on SEV ASIDs).  Functionally, I doubt it
>>> would change anything, I think it'd just be a bunch of renaming.  The hardest
>>> part would probably be figuring out a name :-).
>>>
>>> Another idea would be to go even more generic and implement a KVM cgroup
>>> that accounts the number of VMs of a particular type, e.g. legacy, SEV,
>>> SEV-ES?, and TDX.  That has potential future problems though as it falls
>>> apart if hardware every supports 1:MANY VMs:KEYS, or if there is a need to
>>> account keys outside of KVM, e.g. if MKTME for non-KVM cases ever sees the
>>> light of day.
>>
>> I read about the TDX and its use of the KeyID for encrypting VMs. TDX
>> has two kinds of KeyIDs private and shared.
> 
> To clarify, "shared" KeyIDs are simply legacy MKTME KeyIDs.  This is relevant
> because those KeyIDs can be used without TDX or KVM in the picture.
> 
>> On AMD platform there are two types of ASIDs for encryption.
>> 1. SEV ASID - Normal runtime guest memory encryption.
>> 2. SEV-ES ASID - Extends SEV ASID by adding register state encryption with
>> 		 integrity.
>>
>> Both types of ASIDs have their own maximum value which is provisioned in
>> the firmware
> 
> Ugh, I missed that detail in the SEV-ES RFC.  Does SNP add another ASID type,
> or does it reuse SEV-ES ASIDs?  If it does add another type, is that trend
> expected to continue, i.e. will SEV end up with SEV, SEV-ES, SEV-ES-SNP,
> SEV-ES-SNP-X, SEV-ES-SNP-X-Y, etc...?

SEV-SNP and SEV-ES share the same ASID range.

Thanks,
Tom

> 
>> So, we are talking about 4 different types of resources:
>> 1. AMD SEV ASID (implemented in this patch as sev.* files in SEV cgroup)
>> 2. AMD SEV-ES ASID (in future, adding files like sev_es.*)
>> 3. Intel TDX private KeyID
>> 4. Intel TDX shared KeyID
>>
>> TDX private KeyID is similar to SEV and SEV-ES ASID. I think coming up
>> with the same name which can be used by both platforms will not be easy,
>> and extensible with the future enhancements. This will get even more
>> difficult if Arm also comes up with something similar but with different
>> nuances.
> 
> Honest question, what's easier for userspace/orchestration layers?  Having an
> abstract but common name, or conrete but different names?  My gut reaction is
> to provide a common interface, but I can see how that could do more harm than
> good, e.g. some amount of hardware capabilitiy discovery is possible with
> concrete names.  And I'm guessing there's already a fair amount of vendor
> specific knowledge bleeding into userspace for these features...
> 
> And if SNP is adding another ASID namespace, trying to abstract the types is
> probably a lost cause.
> 
>  From a code perspective, I doubt it will matter all that much, e.g. it should
> be easy enough to provide helpers for exposing a new asid/key type.
> 
>> I like the idea of the KVM cgroup and when it is mounted it will have
>> different files based on the hardware platform.
> 
> I don't think a KVM cgroup is the correct approach, e.g. there are potential
> use cases for "legacy" MKTME without KVM.  Maybe something like Encryption
> Keys cgroup?
> 
>> 1. KVM cgroup on AMD will have:
>> sev.max & sev.current.
>> sev_es.max & sev_es.current.
>>
>> 2. KVM cgroup mounted on Intel:
>> tdx_private_keys.max
>> tdx_shared_keys.max
>>
>> The KVM cgroup can be used to have control files which are generic (no
>> use case in my mind right now) and hardware platform specific files
>> also.
> 
> My "generic KVM cgroup" suggestion was probably a pretty bad suggestion.
> Except for ASIDs/KeyIDs, KVM itself doesn't manage any constrained resources,
> e.g. memory, logical CPUs, time slices, etc... are all generic resources that
> are consumed by KVM but managed elsewhere.  We definitely don't want to change
> that, nor do I think we want to do anything, such as creating a KVM cgroup,
> that would imply that having KVM manage resources is a good idea.
>
Vipin Sharma Sept. 25, 2020, 10:22 p.m. UTC | #5
On Thu, Sep 24, 2020 at 02:55:18PM -0500, Tom Lendacky wrote:
> On 9/24/20 2:21 PM, Sean Christopherson wrote:
> > On Tue, Sep 22, 2020 at 02:14:04PM -0700, Vipin Sharma wrote:
> > > On Mon, Sep 21, 2020 at 06:48:38PM -0700, Sean Christopherson wrote:
> > > > On Mon, Sep 21, 2020 at 05:40:22PM -0700, Vipin Sharma wrote:
> > > > > Hello,
> > > > > 
> > > > > This patch series adds a new SEV controller for tracking and limiting
> > > > > the usage of SEV ASIDs on the AMD SVM platform.
> > > > > 
> > > > > SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
> > > > > but this resource is in very limited quantity on a host.
> > > > > 
> > > > > This limited quantity creates issues like SEV ASID starvation and
> > > > > unoptimized scheduling in the cloud infrastructure.
> > > > > 
> > > > > SEV controller provides SEV ASID tracking and resource control
> > > > > mechanisms.
> > > > 
> > > > This should be genericized to not be SEV specific.  TDX has a similar
> > > > scarcity issue in the form of key IDs, which IIUC are analogous to SEV ASIDs
> > > > (gave myself a quick crash course on SEV ASIDs).  Functionally, I doubt it
> > > > would change anything, I think it'd just be a bunch of renaming.  The hardest
> > > > part would probably be figuring out a name :-).
> > > > 
> > > > Another idea would be to go even more generic and implement a KVM cgroup
> > > > that accounts the number of VMs of a particular type, e.g. legacy, SEV,
> > > > SEV-ES?, and TDX.  That has potential future problems though as it falls
> > > > apart if hardware every supports 1:MANY VMs:KEYS, or if there is a need to
> > > > account keys outside of KVM, e.g. if MKTME for non-KVM cases ever sees the
> > > > light of day.
> > > 
> > > I read about the TDX and its use of the KeyID for encrypting VMs. TDX
> > > has two kinds of KeyIDs private and shared.
> > 
> > To clarify, "shared" KeyIDs are simply legacy MKTME KeyIDs.  This is relevant
> > because those KeyIDs can be used without TDX or KVM in the picture.
> > 
> > > On AMD platform there are two types of ASIDs for encryption.
> > > 1. SEV ASID - Normal runtime guest memory encryption.
> > > 2. SEV-ES ASID - Extends SEV ASID by adding register state encryption with
> > > 		 integrity.
> > > 
> > > Both types of ASIDs have their own maximum value which is provisioned in
> > > the firmware
> > 
> > Ugh, I missed that detail in the SEV-ES RFC.  Does SNP add another ASID type,
> > or does it reuse SEV-ES ASIDs?  If it does add another type, is that trend
> > expected to continue, i.e. will SEV end up with SEV, SEV-ES, SEV-ES-SNP,
> > SEV-ES-SNP-X, SEV-ES-SNP-X-Y, etc...?
> 
> SEV-SNP and SEV-ES share the same ASID range.
> 
> Thanks,
> Tom
> 
> > 
> > > So, we are talking about 4 different types of resources:
> > > 1. AMD SEV ASID (implemented in this patch as sev.* files in SEV cgroup)
> > > 2. AMD SEV-ES ASID (in future, adding files like sev_es.*)
> > > 3. Intel TDX private KeyID
> > > 4. Intel TDX shared KeyID
> > > 
> > > TDX private KeyID is similar to SEV and SEV-ES ASID. I think coming up
> > > with the same name which can be used by both platforms will not be easy,
> > > and extensible with the future enhancements. This will get even more
> > > difficult if Arm also comes up with something similar but with different
> > > nuances.
> > 
> > Honest question, what's easier for userspace/orchestration layers?  Having an
> > abstract but common name, or conrete but different names?  My gut reaction is
> > to provide a common interface, but I can see how that could do more harm than
> > good, e.g. some amount of hardware capabilitiy discovery is possible with
> > concrete names.  And I'm guessing there's already a fair amount of vendor
> > specific knowledge bleeding into userspace for these features...

I agree with you that the abstract name is better than the concrete
name, I also feel that we must provide HW extensions. Here is one
approach:

Cgroup name: cpu_encryption, encryption_slots, or memcrypt (open to
suggestions)

Control files: slots.{max, current, events}

Contents of the slot.max:
default max
	default: Corresponds to all kinds of encryption capabilities on
		 a platform. For AMD, it will be SEV and SEV-ES.  For
		 Intel, it will be TDX and MKTME. This can also be used
		 by other devices not just CPU.

	max: max or any number to denote limit on the cgroup.

A user who wants the finer control, then they need to know about the
capabilities a platform provides and use them, e.g. on AMD:

$ echo "sev-es 1000" > slot.max
$ cat slots.max
default max sev-es 1000

This means that in the cgroup maximum SEV-ES ASIDs which can be used is
1000 and SEV ASIDs is max (default, no limit).  Each platform can
provide their own extensions which can be overwritten by a user,
otherwise extensions will have the default limit.

This is kind of similar to the IO and the rdma controller.

I think it is keeping abstraction for userspace and also providing finer
control for HW specific features.

What do you think about the above approach?  

> > 
> > And if SNP is adding another ASID namespace, trying to abstract the types is
> > probably a lost cause.
> > 
> >  From a code perspective, I doubt it will matter all that much, e.g. it should
> > be easy enough to provide helpers for exposing a new asid/key type.
> > 
> > > I like the idea of the KVM cgroup and when it is mounted it will have
> > > different files based on the hardware platform.
> > 
> > I don't think a KVM cgroup is the correct approach, e.g. there are potential
> > use cases for "legacy" MKTME without KVM.  Maybe something like Encryption
> > Keys cgroup?

Added some suggestion in the above approach. I think we should not add
key in the name because here limitation is on the number of keys that
can be used simultaneously. 

> > 
> > > 1. KVM cgroup on AMD will have:
> > > sev.max & sev.current.
> > > sev_es.max & sev_es.current.
> > > 
> > > 2. KVM cgroup mounted on Intel:
> > > tdx_private_keys.max
> > > tdx_shared_keys.max
> > > 
> > > The KVM cgroup can be used to have control files which are generic (no
> > > use case in my mind right now) and hardware platform specific files
> > > also.
> > 
> > My "generic KVM cgroup" suggestion was probably a pretty bad suggestion.
> > Except for ASIDs/KeyIDs, KVM itself doesn't manage any constrained resources,
> > e.g. memory, logical CPUs, time slices, etc... are all generic resources that
> > are consumed by KVM but managed elsewhere.  We definitely don't want to change
> > that, nor do I think we want to do anything, such as creating a KVM cgroup,
> > that would imply that having KVM manage resources is a good idea.
> >
Janosch Frank Sept. 28, 2020, 9:12 a.m. UTC | #6
On 9/23/20 2:47 PM, Paolo Bonzini wrote:
> On 22/09/20 03:48, Sean Christopherson wrote:
>> This should be genericized to not be SEV specific.  TDX has a similar
>> scarcity issue in the form of key IDs, which IIUC are analogous to SEV ASIDs
>> (gave myself a quick crash course on SEV ASIDs).  Functionally, I doubt it
>> would change anything, I think it'd just be a bunch of renaming.  The hardest
>> part would probably be figuring out a name :-).
>>
>> Another idea would be to go even more generic and implement a KVM cgroup
>> that accounts the number of VMs of a particular type, e.g. legacy, SEV,
>> SEV-ES?, and TDX.  That has potential future problems though as it falls
>> apart if hardware every supports 1:MANY VMs:KEYS, or if there is a need to
>> account keys outside of KVM, e.g. if MKTME for non-KVM cases ever sees the
>> light of day.
> 
> Or also MANY:1 (we are thinking of having multiple VMs share the same
> SEV ASID).
> 
> It might even be the same on s390 and PPC, in which case we probably
> want to implement this in virt/kvm.  Paul, Janosch, do you think this
> would make sense for you?  The original commit message is below.
> 
> Paolo
> 
>> On Mon, Sep 21, 2020 at 05:40:22PM -0700, Vipin Sharma wrote:
>>> Hello,
>>>
>>> This patch series adds a new SEV controller for tracking and limiting
>>> the usage of SEV ASIDs on the AMD SVM platform.
>>>
>>> SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
>>> but this resource is in very limited quantity on a host.
>>>
>>> This limited quantity creates issues like SEV ASID starvation and
>>> unoptimized scheduling in the cloud infrastructure.
>>>
>>> SEV controller provides SEV ASID tracking and resource control
>>> mechanisms.
> 

On s390 we currently support a few million protected guests per LPAR so
guest IDs are not exactly scarce. However having accounting for them
might add some value nevertheless, especially when having large amount
of protected containers.

@Christian: Any thoughts on this?
Christian Borntraeger Sept. 28, 2020, 9:21 a.m. UTC | #7
On 28.09.20 11:12, Janosch Frank wrote:
> On 9/23/20 2:47 PM, Paolo Bonzini wrote:
>> On 22/09/20 03:48, Sean Christopherson wrote:
>>> This should be genericized to not be SEV specific.  TDX has a similar
>>> scarcity issue in the form of key IDs, which IIUC are analogous to SEV ASIDs
>>> (gave myself a quick crash course on SEV ASIDs).  Functionally, I doubt it
>>> would change anything, I think it'd just be a bunch of renaming.  The hardest
>>> part would probably be figuring out a name :-).
>>>
>>> Another idea would be to go even more generic and implement a KVM cgroup
>>> that accounts the number of VMs of a particular type, e.g. legacy, SEV,
>>> SEV-ES?, and TDX.  That has potential future problems though as it falls
>>> apart if hardware every supports 1:MANY VMs:KEYS, or if there is a need to
>>> account keys outside of KVM, e.g. if MKTME for non-KVM cases ever sees the
>>> light of day.
>>
>> Or also MANY:1 (we are thinking of having multiple VMs share the same
>> SEV ASID).
>>
>> It might even be the same on s390 and PPC, in which case we probably
>> want to implement this in virt/kvm.  Paul, Janosch, do you think this
>> would make sense for you?  The original commit message is below.
>>
>> Paolo
>>
>>> On Mon, Sep 21, 2020 at 05:40:22PM -0700, Vipin Sharma wrote:
>>>> Hello,
>>>>
>>>> This patch series adds a new SEV controller for tracking and limiting
>>>> the usage of SEV ASIDs on the AMD SVM platform.
>>>>
>>>> SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
>>>> but this resource is in very limited quantity on a host.
>>>>
>>>> This limited quantity creates issues like SEV ASID starvation and
>>>> unoptimized scheduling in the cloud infrastructure.
>>>>
>>>> SEV controller provides SEV ASID tracking and resource control
>>>> mechanisms.
>>
> 
> On s390 we currently support a few million protected guests per LPAR so
> guest IDs are not exactly scarce. However having accounting for them
> might add some value nevertheless, especially when having large amount
> of protected containers.
> 
> @Christian: Any thoughts on this?

Yes, maybe it is a good idea to limit containers to only have a sane number
of secure guests, so that a malicious pod cannot consume all IDs by calling
CREATE_VM and KVM_PV_ENABLE million times or so.
Peter Gonda Oct. 1, 2020, 6:08 p.m. UTC | #8
On Thu, Sep 24, 2020 at 1:55 PM Tom Lendacky <thomas.lendacky@amd.com> wrote:
>
> On 9/24/20 2:21 PM, Sean Christopherson wrote:
> > On Tue, Sep 22, 2020 at 02:14:04PM -0700, Vipin Sharma wrote:
> >> On Mon, Sep 21, 2020 at 06:48:38PM -0700, Sean Christopherson wrote:
> >>> On Mon, Sep 21, 2020 at 05:40:22PM -0700, Vipin Sharma wrote:
> >>>> Hello,
> >>>>
> >>>> This patch series adds a new SEV controller for tracking and limiting
> >>>> the usage of SEV ASIDs on the AMD SVM platform.
> >>>>
> >>>> SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
> >>>> but this resource is in very limited quantity on a host.
> >>>>
> >>>> This limited quantity creates issues like SEV ASID starvation and
> >>>> unoptimized scheduling in the cloud infrastructure.
> >>>>
> >>>> SEV controller provides SEV ASID tracking and resource control
> >>>> mechanisms.
> >>>
> >>> This should be genericized to not be SEV specific.  TDX has a similar
> >>> scarcity issue in the form of key IDs, which IIUC are analogous to SEV ASIDs
> >>> (gave myself a quick crash course on SEV ASIDs).  Functionally, I doubt it
> >>> would change anything, I think it'd just be a bunch of renaming.  The hardest
> >>> part would probably be figuring out a name :-).
> >>>
> >>> Another idea would be to go even more generic and implement a KVM cgroup
> >>> that accounts the number of VMs of a particular type, e.g. legacy, SEV,
> >>> SEV-ES?, and TDX.  That has potential future problems though as it falls
> >>> apart if hardware every supports 1:MANY VMs:KEYS, or if there is a need to
> >>> account keys outside of KVM, e.g. if MKTME for non-KVM cases ever sees the
> >>> light of day.
> >>
> >> I read about the TDX and its use of the KeyID for encrypting VMs. TDX
> >> has two kinds of KeyIDs private and shared.
> >
> > To clarify, "shared" KeyIDs are simply legacy MKTME KeyIDs.  This is relevant
> > because those KeyIDs can be used without TDX or KVM in the picture.
> >
> >> On AMD platform there are two types of ASIDs for encryption.
> >> 1. SEV ASID - Normal runtime guest memory encryption.
> >> 2. SEV-ES ASID - Extends SEV ASID by adding register state encryption with
> >>               integrity.
> >>
> >> Both types of ASIDs have their own maximum value which is provisioned in
> >> the firmware
> >
> > Ugh, I missed that detail in the SEV-ES RFC.  Does SNP add another ASID type,
> > or does it reuse SEV-ES ASIDs?  If it does add another type, is that trend
> > expected to continue, i.e. will SEV end up with SEV, SEV-ES, SEV-ES-SNP,
> > SEV-ES-SNP-X, SEV-ES-SNP-X-Y, etc...?
>
> SEV-SNP and SEV-ES share the same ASID range.

Where is this documented? From the SEV-SNP FW ABI Spec 0.8 "The
firmware checks that ASID is an encryption capable ASID. If not, the
firmware returns INVALID_ASID." that doesn't seem clear that an SEV-ES
ASID is required. Should this document be more clear?
Tom Lendacky Oct. 1, 2020, 10:44 p.m. UTC | #9
On 10/1/20 1:08 PM, Peter Gonda wrote:
> On Thu, Sep 24, 2020 at 1:55 PM Tom Lendacky <thomas.lendacky@amd.com> wrote:
>>
>> On 9/24/20 2:21 PM, Sean Christopherson wrote:
>>> On Tue, Sep 22, 2020 at 02:14:04PM -0700, Vipin Sharma wrote:
>>>> On Mon, Sep 21, 2020 at 06:48:38PM -0700, Sean Christopherson wrote:
>>>>> On Mon, Sep 21, 2020 at 05:40:22PM -0700, Vipin Sharma wrote:
>>>>>> Hello,
>>>>>>
>>>>>> This patch series adds a new SEV controller for tracking and limiting
>>>>>> the usage of SEV ASIDs on the AMD SVM platform.
>>>>>>
>>>>>> SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
>>>>>> but this resource is in very limited quantity on a host.
>>>>>>
>>>>>> This limited quantity creates issues like SEV ASID starvation and
>>>>>> unoptimized scheduling in the cloud infrastructure.
>>>>>>
>>>>>> SEV controller provides SEV ASID tracking and resource control
>>>>>> mechanisms.
>>>>>
>>>>> This should be genericized to not be SEV specific.  TDX has a similar
>>>>> scarcity issue in the form of key IDs, which IIUC are analogous to SEV ASIDs
>>>>> (gave myself a quick crash course on SEV ASIDs).  Functionally, I doubt it
>>>>> would change anything, I think it'd just be a bunch of renaming.  The hardest
>>>>> part would probably be figuring out a name :-).
>>>>>
>>>>> Another idea would be to go even more generic and implement a KVM cgroup
>>>>> that accounts the number of VMs of a particular type, e.g. legacy, SEV,
>>>>> SEV-ES?, and TDX.  That has potential future problems though as it falls
>>>>> apart if hardware every supports 1:MANY VMs:KEYS, or if there is a need to
>>>>> account keys outside of KVM, e.g. if MKTME for non-KVM cases ever sees the
>>>>> light of day.
>>>>
>>>> I read about the TDX and its use of the KeyID for encrypting VMs. TDX
>>>> has two kinds of KeyIDs private and shared.
>>>
>>> To clarify, "shared" KeyIDs are simply legacy MKTME KeyIDs.  This is relevant
>>> because those KeyIDs can be used without TDX or KVM in the picture.
>>>
>>>> On AMD platform there are two types of ASIDs for encryption.
>>>> 1. SEV ASID - Normal runtime guest memory encryption.
>>>> 2. SEV-ES ASID - Extends SEV ASID by adding register state encryption with
>>>>                integrity.
>>>>
>>>> Both types of ASIDs have their own maximum value which is provisioned in
>>>> the firmware
>>>
>>> Ugh, I missed that detail in the SEV-ES RFC.  Does SNP add another ASID type,
>>> or does it reuse SEV-ES ASIDs?  If it does add another type, is that trend
>>> expected to continue, i.e. will SEV end up with SEV, SEV-ES, SEV-ES-SNP,
>>> SEV-ES-SNP-X, SEV-ES-SNP-X-Y, etc...?
>>
>> SEV-SNP and SEV-ES share the same ASID range.
> 
> Where is this documented? From the SEV-SNP FW ABI Spec 0.8 "The
> firmware checks that ASID is an encryption capable ASID. If not, the
> firmware returns INVALID_ASID." that doesn't seem clear that an SEV-ES
> ASID is required. Should this document be more clear?

I let the owner of the spec know and it will be updated.

Thanks,
Tom

>
Vipin Sharma Oct. 2, 2020, 8:48 p.m. UTC | #10
On Fri, Sep 25, 2020 at 03:22:20PM -0700, Vipin Sharma wrote:
> On Thu, Sep 24, 2020 at 02:55:18PM -0500, Tom Lendacky wrote:
> > On 9/24/20 2:21 PM, Sean Christopherson wrote:
> > > On Tue, Sep 22, 2020 at 02:14:04PM -0700, Vipin Sharma wrote:
> > > > On Mon, Sep 21, 2020 at 06:48:38PM -0700, Sean Christopherson wrote:
> > > > > On Mon, Sep 21, 2020 at 05:40:22PM -0700, Vipin Sharma wrote:
> > > > > > Hello,
> > > > > > 
> > > > > > This patch series adds a new SEV controller for tracking and limiting
> > > > > > the usage of SEV ASIDs on the AMD SVM platform.
> > > > > > 
> > > > > > SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
> > > > > > but this resource is in very limited quantity on a host.
> > > > > > 
> > > > > > This limited quantity creates issues like SEV ASID starvation and
> > > > > > unoptimized scheduling in the cloud infrastructure.
> > > > > > 
> > > > > > SEV controller provides SEV ASID tracking and resource control
> > > > > > mechanisms.
> > > > > 
> > > > > This should be genericized to not be SEV specific.  TDX has a similar
> > > > > scarcity issue in the form of key IDs, which IIUC are analogous to SEV ASIDs
> > > > > (gave myself a quick crash course on SEV ASIDs).  Functionally, I doubt it
> > > > > would change anything, I think it'd just be a bunch of renaming.  The hardest
> > > > > part would probably be figuring out a name :-).
> > > > > 
> > > > > Another idea would be to go even more generic and implement a KVM cgroup
> > > > > that accounts the number of VMs of a particular type, e.g. legacy, SEV,
> > > > > SEV-ES?, and TDX.  That has potential future problems though as it falls
> > > > > apart if hardware every supports 1:MANY VMs:KEYS, or if there is a need to
> > > > > account keys outside of KVM, e.g. if MKTME for non-KVM cases ever sees the
> > > > > light of day.
> > > > 
> > > > I read about the TDX and its use of the KeyID for encrypting VMs. TDX
> > > > has two kinds of KeyIDs private and shared.
> > > 
> > > To clarify, "shared" KeyIDs are simply legacy MKTME KeyIDs.  This is relevant
> > > because those KeyIDs can be used without TDX or KVM in the picture.
> > > 
> > > > On AMD platform there are two types of ASIDs for encryption.
> > > > 1. SEV ASID - Normal runtime guest memory encryption.
> > > > 2. SEV-ES ASID - Extends SEV ASID by adding register state encryption with
> > > > 		 integrity.
> > > > 
> > > > Both types of ASIDs have their own maximum value which is provisioned in
> > > > the firmware
> > > 
> > > Ugh, I missed that detail in the SEV-ES RFC.  Does SNP add another ASID type,
> > > or does it reuse SEV-ES ASIDs?  If it does add another type, is that trend
> > > expected to continue, i.e. will SEV end up with SEV, SEV-ES, SEV-ES-SNP,
> > > SEV-ES-SNP-X, SEV-ES-SNP-X-Y, etc...?
> > 
> > SEV-SNP and SEV-ES share the same ASID range.
> > 
> > Thanks,
> > Tom
> > 
> > > 
> > > > So, we are talking about 4 different types of resources:
> > > > 1. AMD SEV ASID (implemented in this patch as sev.* files in SEV cgroup)
> > > > 2. AMD SEV-ES ASID (in future, adding files like sev_es.*)
> > > > 3. Intel TDX private KeyID
> > > > 4. Intel TDX shared KeyID
> > > > 
> > > > TDX private KeyID is similar to SEV and SEV-ES ASID. I think coming up
> > > > with the same name which can be used by both platforms will not be easy,
> > > > and extensible with the future enhancements. This will get even more
> > > > difficult if Arm also comes up with something similar but with different
> > > > nuances.
> > > 
> > > Honest question, what's easier for userspace/orchestration layers?  Having an
> > > abstract but common name, or conrete but different names?  My gut reaction is
> > > to provide a common interface, but I can see how that could do more harm than
> > > good, e.g. some amount of hardware capabilitiy discovery is possible with
> > > concrete names.  And I'm guessing there's already a fair amount of vendor
> > > specific knowledge bleeding into userspace for these features...
> 
> I agree with you that the abstract name is better than the concrete
> name, I also feel that we must provide HW extensions. Here is one
> approach:
> 
> Cgroup name: cpu_encryption, encryption_slots, or memcrypt (open to
> suggestions)
> 
> Control files: slots.{max, current, events}
> 
> Contents of the slot.max:
> default max
> 	default: Corresponds to all kinds of encryption capabilities on
> 		 a platform. For AMD, it will be SEV and SEV-ES.  For
> 		 Intel, it will be TDX and MKTME. This can also be used
> 		 by other devices not just CPU.
> 
> 	max: max or any number to denote limit on the cgroup.
> 
> A user who wants the finer control, then they need to know about the
> capabilities a platform provides and use them, e.g. on AMD:
> 
> $ echo "sev-es 1000" > slot.max
> $ cat slots.max
> default max sev-es 1000
> 
> This means that in the cgroup maximum SEV-ES ASIDs which can be used is
> 1000 and SEV ASIDs is max (default, no limit).  Each platform can
> provide their own extensions which can be overwritten by a user,
> otherwise extensions will have the default limit.
> 
> This is kind of similar to the IO and the rdma controller.
> 
> I think it is keeping abstraction for userspace and also providing finer
> control for HW specific features.
> 
> What do you think about the above approach?  
> 
Hi Sean,

Any feedback/concern for the above abstraction approach?

Thanks
Sean Christopherson Nov. 3, 2020, 2:06 a.m. UTC | #11
On Fri, Oct 02, 2020 at 01:48:10PM -0700, Vipin Sharma wrote:
> On Fri, Sep 25, 2020 at 03:22:20PM -0700, Vipin Sharma wrote:
> > I agree with you that the abstract name is better than the concrete
> > name, I also feel that we must provide HW extensions. Here is one
> > approach:
> > 
> > Cgroup name: cpu_encryption, encryption_slots, or memcrypt (open to
> > suggestions)
> > 
> > Control files: slots.{max, current, events}

I don't particularly like the "slots" name, mostly because it could be confused
with KVM's memslots.  Maybe encryption_ids.ids.{max, current, events}?  I don't
love those names either, but "encryption" and "IDs" are the two obvious
commonalities betwee TDX's encryption key IDs and SEV's encryption address
space IDs.
David Rientjes Nov. 14, 2020, 12:26 a.m. UTC | #12
On Mon, 2 Nov 2020, Sean Christopherson wrote:

> On Fri, Oct 02, 2020 at 01:48:10PM -0700, Vipin Sharma wrote:
> > On Fri, Sep 25, 2020 at 03:22:20PM -0700, Vipin Sharma wrote:
> > > I agree with you that the abstract name is better than the concrete
> > > name, I also feel that we must provide HW extensions. Here is one
> > > approach:
> > > 
> > > Cgroup name: cpu_encryption, encryption_slots, or memcrypt (open to
> > > suggestions)
> > > 
> > > Control files: slots.{max, current, events}
> 
> I don't particularly like the "slots" name, mostly because it could be confused
> with KVM's memslots.  Maybe encryption_ids.ids.{max, current, events}?  I don't
> love those names either, but "encryption" and "IDs" are the two obvious
> commonalities betwee TDX's encryption key IDs and SEV's encryption address
> space IDs.
> 

Looping Janosch and Christian back into the thread.

I interpret this suggestion as
encryption.{sev,sev_es,keyids}.{max,current,events} for AMD and Intel 
offerings, which was my thought on this as well.

Certainly the kernel could provide a single interface for all of these and 
key value pairs depending on the underlying encryption technology but it 
seems to only introduce additional complexity in the kernel in string 
parsing that can otherwise be avoided.  I think we all agree that a single 
interface for all encryption keys or one-value-per-file could be done in 
the kernel and handled by any userspace agent that is configuring these 
values.

I think Vipin is adding a root level file that describes how many keys we 
have available on the platform for each technology.  So I think this comes 
down to, for example, a single encryption.max file vs 
encryption.{sev,sev_es,keyid}.max.  SEV and SEV-ES ASIDs are provisioned 
separately so we treat them as their own resource here.

So which is easier?

$ cat encryption.sev.max
10
$ echo -n 15 > encryption.sev.max

or

$ cat encryption.max
sev 10
sev_es 10
keyid 0
$ echo -n "sev 10" > encryption.max

I would argue the former is simplest (always preferring 
one-value-per-file) and avoids any string parsing or resource controller 
lookups that need to match on that string in the kernel.

The set of encryption.{sev,sev_es,keyid} files that exist would depend on
CONFIG_CGROUP_ENCRYPTION and whether CONFIG_AMD_MEM_ENCRYPT or 
CONFIG_INTEL_TDX is configured.  Both can be configured so we have all 
three files, but the root file will obviously indicate 0 keys available 
for one of them (can't run on AMD and Intel at the same time :).

So I'm inclined to suggest that the one-value-per-file format is the ideal 
way to go unless there are objections to it.
Sean Christopherson Nov. 24, 2020, 7:16 p.m. UTC | #13
On Fri, Nov 13, 2020, David Rientjes wrote:                                     
>                                                                               
> On Mon, 2 Nov 2020, Sean Christopherson wrote:                                
>                                                                               
> > On Fri, Oct 02, 2020 at 01:48:10PM -0700, Vipin Sharma wrote:               
> > > On Fri, Sep 25, 2020 at 03:22:20PM -0700, Vipin Sharma wrote:             
> > > > I agree with you that the abstract name is better than the concrete     
> > > > name, I also feel that we must provide HW extensions. Here is one       
> > > > approach:                                                               
> > > >                                                                         
> > > > Cgroup name: cpu_encryption, encryption_slots, or memcrypt (open to     
> > > > suggestions)                                                            
> > > >                                                                         
> > > > Control files: slots.{max, current, events}                             
> >                                                                             
> > I don't particularly like the "slots" name, mostly because it could be confused
> > with KVM's memslots.  Maybe encryption_ids.ids.{max, current, events}?  I don't
> > love those names either, but "encryption" and "IDs" are the two obvious     
> > commonalities betwee TDX's encryption key IDs and SEV's encryption address  
> > space IDs.                                                                  
> >                                                                             
>                                                                               
> Looping Janosch and Christian back into the thread.                           
>                                                                               
> I interpret this suggestion as                                                
> encryption.{sev,sev_es,keyids}.{max,current,events} for AMD and Intel         

I think it makes sense to use encryption_ids instead of simply encryption, that
way it's clear the cgroup is accounting ids as opposed to restricting what
techs can be used on yes/no basis.

> offerings, which was my thought on this as well.                              
>                                                                               
> Certainly the kernel could provide a single interface for all of these and    
> key value pairs depending on the underlying encryption technology but it      
> seems to only introduce additional complexity in the kernel in string         
> parsing that can otherwise be avoided.  I think we all agree that a single    
> interface for all encryption keys or one-value-per-file could be done in      
> the kernel and handled by any userspace agent that is configuring these       
> values.                                                                       
>                                                                               
> I think Vipin is adding a root level file that describes how many keys we     
> have available on the platform for each technology.  So I think this comes    
> down to, for example, a single encryption.max file vs                         
> encryption.{sev,sev_es,keyid}.max.  SEV and SEV-ES ASIDs are provisioned      

Are you suggesting that the cgroup omit "current" and "events"?  I agree there's
no need to enumerate platform total, but not knowing how many of the allowed IDs
have been allocated seems problematic.

> separately so we treat them as their own resource here.                       
>                                                                               
> So which is easier?                                                           
>                                                                               
> $ cat encryption.sev.max                                                      
> 10                                                                            
> $ echo -n 15 > encryption.sev.max                                             
>                                                                               
> or                                                                            
>                                                                               
> $ cat encryption.max                                                          
> sev 10                                                                        
> sev_es 10                                                                     
> keyid 0                                                                       
> $ echo -n "sev 10" > encryption.max                                           
>                                                                               
> I would argue the former is simplest (always preferring                       
> one-value-per-file) and avoids any string parsing or resource controller      
> lookups that need to match on that string in the kernel.                      

Ya, I prefer individual files as well.

I don't think "keyid" is the best name for TDX, it doesn't leave any wiggle room
if there are other flavors of key IDs on Intel platform, e.g. private vs. shared
in the future.  It's also inconsistent with the SEV names, e.g. "asid" isn't
mentioned anywhere.  And "keyid" sort of reads as "max key id", rather than "max
number of keyids".  Maybe "tdx_private", or simply "tdx"?  Doesn't have to be
solved now though, there's plenty of time before TDX will be upstream. :-)

> The set of encryption.{sev,sev_es,keyid} files that exist would depend on     
> CONFIG_CGROUP_ENCRYPTION and whether CONFIG_AMD_MEM_ENCRYPT or                
> CONFIG_INTEL_TDX is configured.  Both can be configured so we have all        
> three files, but the root file will obviously indicate 0 keys available       
> for one of them (can't run on AMD and Intel at the same time :).              
>                                                                               
> So I'm inclined to suggest that the one-value-per-file format is the ideal    
> way to go unless there are objections to it.
Vipin Sharma Nov. 24, 2020, 7:49 p.m. UTC | #14
On Tue, Nov 24, 2020 at 07:16:29PM +0000, Sean Christopherson wrote:
> On Fri, Nov 13, 2020, David Rientjes wrote:                                     
> >                                                                               
> > On Mon, 2 Nov 2020, Sean Christopherson wrote:                                
> >                                                                               
> > > On Fri, Oct 02, 2020 at 01:48:10PM -0700, Vipin Sharma wrote:               
> > > > On Fri, Sep 25, 2020 at 03:22:20PM -0700, Vipin Sharma wrote:             
> > > > > I agree with you that the abstract name is better than the concrete     
> > > > > name, I also feel that we must provide HW extensions. Here is one       
> > > > > approach:                                                               
> > > > >                                                                         
> > > > > Cgroup name: cpu_encryption, encryption_slots, or memcrypt (open to     
> > > > > suggestions)                                                            
> > > > >                                                                         
> > > > > Control files: slots.{max, current, events}                             
> > >                                                                             
> > > I don't particularly like the "slots" name, mostly because it could be confused
> > > with KVM's memslots.  Maybe encryption_ids.ids.{max, current, events}?  I don't
> > > love those names either, but "encryption" and "IDs" are the two obvious     
> > > commonalities betwee TDX's encryption key IDs and SEV's encryption address  
> > > space IDs.                                                                  
> > >                                                                             
> >                                                                               
> > Looping Janosch and Christian back into the thread.                           
> >                                                                               
> > I interpret this suggestion as                                                
> > encryption.{sev,sev_es,keyids}.{max,current,events} for AMD and Intel         
> 
> I think it makes sense to use encryption_ids instead of simply encryption, that
> way it's clear the cgroup is accounting ids as opposed to restricting what
> techs can be used on yes/no basis.
> 
> > offerings, which was my thought on this as well.                              
> >                                                                               
> > Certainly the kernel could provide a single interface for all of these and    
> > key value pairs depending on the underlying encryption technology but it      
> > seems to only introduce additional complexity in the kernel in string         
> > parsing that can otherwise be avoided.  I think we all agree that a single    
> > interface for all encryption keys or one-value-per-file could be done in      
> > the kernel and handled by any userspace agent that is configuring these       
> > values.                                                                       
> >                                                                               
> > I think Vipin is adding a root level file that describes how many keys we     
> > have available on the platform for each technology.  So I think this comes    
> > down to, for example, a single encryption.max file vs                         
> > encryption.{sev,sev_es,keyid}.max.  SEV and SEV-ES ASIDs are provisioned      
> 
> Are you suggesting that the cgroup omit "current" and "events"?  I agree there's
> no need to enumerate platform total, but not knowing how many of the allowed IDs
> have been allocated seems problematic.
> 

We will be showing encryption_ids.{sev,sev_es}.{max,current}
I am inclined to not provide "events" as I am not using it, let me know
if this file is required, I can provide it then.

I will provide an encryption_ids.{sev,sev_es}.stat file, which shows
total available ids on the platform. This one will be useful for
scheduling jobs in the cloud infrastructure based on total supported
capacity.

> > separately so we treat them as their own resource here.                       
> >                                                                               
> > So which is easier?                                                           
> >                                                                               
> > $ cat encryption.sev.max                                                      
> > 10                                                                            
> > $ echo -n 15 > encryption.sev.max                                             
> >                                                                               
> > or                                                                            
> >                                                                               
> > $ cat encryption.max                                                          
> > sev 10                                                                        
> > sev_es 10                                                                     
> > keyid 0                                                                       
> > $ echo -n "sev 10" > encryption.max                                           
> >                                                                               
> > I would argue the former is simplest (always preferring                       
> > one-value-per-file) and avoids any string parsing or resource controller      
> > lookups that need to match on that string in the kernel.                      
> 
> Ya, I prefer individual files as well.
> 
> I don't think "keyid" is the best name for TDX, it doesn't leave any wiggle room
> if there are other flavors of key IDs on Intel platform, e.g. private vs. shared
> in the future.  It's also inconsistent with the SEV names, e.g. "asid" isn't
> mentioned anywhere.  And "keyid" sort of reads as "max key id", rather than "max
> number of keyids".  Maybe "tdx_private", or simply "tdx"?  Doesn't have to be
> solved now though, there's plenty of time before TDX will be upstream. :-)
> 
> > The set of encryption.{sev,sev_es,keyid} files that exist would depend on     
> > CONFIG_CGROUP_ENCRYPTION and whether CONFIG_AMD_MEM_ENCRYPT or                
> > CONFIG_INTEL_TDX is configured.  Both can be configured so we have all        
> > three files, but the root file will obviously indicate 0 keys available       
> > for one of them (can't run on AMD and Intel at the same time :).              
> >                                                                               
> > So I'm inclined to suggest that the one-value-per-file format is the ideal    
> > way to go unless there are objections to it.
David Rientjes Nov. 24, 2020, 8:18 p.m. UTC | #15
On Tue, 24 Nov 2020, Vipin Sharma wrote:

> > > Looping Janosch and Christian back into the thread.                           
> > >                                                                               
> > > I interpret this suggestion as                                                
> > > encryption.{sev,sev_es,keyids}.{max,current,events} for AMD and Intel         
> > 
> > I think it makes sense to use encryption_ids instead of simply encryption, that
> > way it's clear the cgroup is accounting ids as opposed to restricting what
> > techs can be used on yes/no basis.
> > 

Agreed.

> > > offerings, which was my thought on this as well.                              
> > >                                                                               
> > > Certainly the kernel could provide a single interface for all of these and    
> > > key value pairs depending on the underlying encryption technology but it      
> > > seems to only introduce additional complexity in the kernel in string         
> > > parsing that can otherwise be avoided.  I think we all agree that a single    
> > > interface for all encryption keys or one-value-per-file could be done in      
> > > the kernel and handled by any userspace agent that is configuring these       
> > > values.                                                                       
> > >                                                                               
> > > I think Vipin is adding a root level file that describes how many keys we     
> > > have available on the platform for each technology.  So I think this comes    
> > > down to, for example, a single encryption.max file vs                         
> > > encryption.{sev,sev_es,keyid}.max.  SEV and SEV-ES ASIDs are provisioned      
> > 
> > Are you suggesting that the cgroup omit "current" and "events"?  I agree there's
> > no need to enumerate platform total, but not knowing how many of the allowed IDs
> > have been allocated seems problematic.
> > 
> 
> We will be showing encryption_ids.{sev,sev_es}.{max,current}
> I am inclined to not provide "events" as I am not using it, let me know
> if this file is required, I can provide it then.
> 
> I will provide an encryption_ids.{sev,sev_es}.stat file, which shows
> total available ids on the platform. This one will be useful for
> scheduling jobs in the cloud infrastructure based on total supported
> capacity.
> 

Makes sense.  I assume the stat file is only at the cgroup root level 
since it would otherwise be duplicating its contents in every cgroup under 
it.  Probably not very helpful for child cgroup to see stat = 509 ASIDs 
but max = 100 :)
Vipin Sharma Nov. 24, 2020, 9:08 p.m. UTC | #16
On Tue, Nov 24, 2020 at 12:18:45PM -0800, David Rientjes wrote:
> On Tue, 24 Nov 2020, Vipin Sharma wrote:
> 
> > > > Looping Janosch and Christian back into the thread.                           
> > > >                                                                               
> > > > I interpret this suggestion as                                                
> > > > encryption.{sev,sev_es,keyids}.{max,current,events} for AMD and Intel         
> > > 
> > > I think it makes sense to use encryption_ids instead of simply encryption, that
> > > way it's clear the cgroup is accounting ids as opposed to restricting what
> > > techs can be used on yes/no basis.
> > > 
> 
> Agreed.
> 
> > > > offerings, which was my thought on this as well.                              
> > > >                                                                               
> > > > Certainly the kernel could provide a single interface for all of these and    
> > > > key value pairs depending on the underlying encryption technology but it      
> > > > seems to only introduce additional complexity in the kernel in string         
> > > > parsing that can otherwise be avoided.  I think we all agree that a single    
> > > > interface for all encryption keys or one-value-per-file could be done in      
> > > > the kernel and handled by any userspace agent that is configuring these       
> > > > values.                                                                       
> > > >                                                                               
> > > > I think Vipin is adding a root level file that describes how many keys we     
> > > > have available on the platform for each technology.  So I think this comes    
> > > > down to, for example, a single encryption.max file vs                         
> > > > encryption.{sev,sev_es,keyid}.max.  SEV and SEV-ES ASIDs are provisioned      
> > > 
> > > Are you suggesting that the cgroup omit "current" and "events"?  I agree there's
> > > no need to enumerate platform total, but not knowing how many of the allowed IDs
> > > have been allocated seems problematic.
> > > 
> > 
> > We will be showing encryption_ids.{sev,sev_es}.{max,current}
> > I am inclined to not provide "events" as I am not using it, let me know
> > if this file is required, I can provide it then.
> > 
> > I will provide an encryption_ids.{sev,sev_es}.stat file, which shows
> > total available ids on the platform. This one will be useful for
> > scheduling jobs in the cloud infrastructure based on total supported
> > capacity.
> > 
> 
> Makes sense.  I assume the stat file is only at the cgroup root level 
> since it would otherwise be duplicating its contents in every cgroup under 
> it.  Probably not very helpful for child cgroup to see stat = 509 ASIDs 
> but max = 100 :)

Yes, only at root.
Sean Christopherson Nov. 24, 2020, 9:27 p.m. UTC | #17
On Tue, Nov 24, 2020, Vipin Sharma wrote:
> On Tue, Nov 24, 2020 at 12:18:45PM -0800, David Rientjes wrote:
> > On Tue, 24 Nov 2020, Vipin Sharma wrote:
> > 
> > > > > Looping Janosch and Christian back into the thread.                           
> > > > >                                                                               
> > > > > I interpret this suggestion as                                                
> > > > > encryption.{sev,sev_es,keyids}.{max,current,events} for AMD and Intel         
> > > > 
> > > > I think it makes sense to use encryption_ids instead of simply encryption, that
> > > > way it's clear the cgroup is accounting ids as opposed to restricting what
> > > > techs can be used on yes/no basis.
> > > > 
> > 
> > Agreed.
> > 
> > > > > offerings, which was my thought on this as well.                              
> > > > >                                                                               
> > > > > Certainly the kernel could provide a single interface for all of these and    
> > > > > key value pairs depending on the underlying encryption technology but it      
> > > > > seems to only introduce additional complexity in the kernel in string         
> > > > > parsing that can otherwise be avoided.  I think we all agree that a single    
> > > > > interface for all encryption keys or one-value-per-file could be done in      
> > > > > the kernel and handled by any userspace agent that is configuring these       
> > > > > values.                                                                       
> > > > >                                                                               
> > > > > I think Vipin is adding a root level file that describes how many keys we     
> > > > > have available on the platform for each technology.  So I think this comes    
> > > > > down to, for example, a single encryption.max file vs                         
> > > > > encryption.{sev,sev_es,keyid}.max.  SEV and SEV-ES ASIDs are provisioned      
> > > > 
> > > > Are you suggesting that the cgroup omit "current" and "events"?  I agree there's
> > > > no need to enumerate platform total, but not knowing how many of the allowed IDs
> > > > have been allocated seems problematic.
> > > > 
> > > 
> > > We will be showing encryption_ids.{sev,sev_es}.{max,current}
> > > I am inclined to not provide "events" as I am not using it, let me know
> > > if this file is required, I can provide it then.

I've no objection to omitting current until it's needed.

> > > I will provide an encryption_ids.{sev,sev_es}.stat file, which shows
> > > total available ids on the platform. This one will be useful for
> > > scheduling jobs in the cloud infrastructure based on total supported
> > > capacity.
> > > 
> > 
> > Makes sense.  I assume the stat file is only at the cgroup root level 
> > since it would otherwise be duplicating its contents in every cgroup under 
> > it.  Probably not very helpful for child cgroup to see stat = 509 ASIDs 
> > but max = 100 :)
> 
> Yes, only at root.

Is a root level stat file needed?  Can't the infrastructure do .max - .current
on the root cgroup to calculate the number of available ids in the system?
Vipin Sharma Nov. 24, 2020, 10:21 p.m. UTC | #18
On Tue, Nov 24, 2020 at 09:27:25PM +0000, Sean Christopherson wrote:
> On Tue, Nov 24, 2020, Vipin Sharma wrote:
> > On Tue, Nov 24, 2020 at 12:18:45PM -0800, David Rientjes wrote:
> > > On Tue, 24 Nov 2020, Vipin Sharma wrote:
> > > 
> > > > > > Looping Janosch and Christian back into the thread.                           
> > > > > >                                                                               
> > > > > > I interpret this suggestion as                                                
> > > > > > encryption.{sev,sev_es,keyids}.{max,current,events} for AMD and Intel         
> > > > > 
> > > > > I think it makes sense to use encryption_ids instead of simply encryption, that
> > > > > way it's clear the cgroup is accounting ids as opposed to restricting what
> > > > > techs can be used on yes/no basis.
> > > > > 
> > > 
> > > Agreed.
> > > 
> > > > > > offerings, which was my thought on this as well.                              
> > > > > >                                                                               
> > > > > > Certainly the kernel could provide a single interface for all of these and    
> > > > > > key value pairs depending on the underlying encryption technology but it      
> > > > > > seems to only introduce additional complexity in the kernel in string         
> > > > > > parsing that can otherwise be avoided.  I think we all agree that a single    
> > > > > > interface for all encryption keys or one-value-per-file could be done in      
> > > > > > the kernel and handled by any userspace agent that is configuring these       
> > > > > > values.                                                                       
> > > > > >                                                                               
> > > > > > I think Vipin is adding a root level file that describes how many keys we     
> > > > > > have available on the platform for each technology.  So I think this comes    
> > > > > > down to, for example, a single encryption.max file vs                         
> > > > > > encryption.{sev,sev_es,keyid}.max.  SEV and SEV-ES ASIDs are provisioned      
> > > > > 
> > > > > Are you suggesting that the cgroup omit "current" and "events"?  I agree there's
> > > > > no need to enumerate platform total, but not knowing how many of the allowed IDs
> > > > > have been allocated seems problematic.
> > > > > 
> > > > 
> > > > We will be showing encryption_ids.{sev,sev_es}.{max,current}
> > > > I am inclined to not provide "events" as I am not using it, let me know
> > > > if this file is required, I can provide it then.
> 
> I've no objection to omitting current until it's needed.
> 
> > > > I will provide an encryption_ids.{sev,sev_es}.stat file, which shows
> > > > total available ids on the platform. This one will be useful for
> > > > scheduling jobs in the cloud infrastructure based on total supported
> > > > capacity.
> > > > 
> > > 
> > > Makes sense.  I assume the stat file is only at the cgroup root level 
> > > since it would otherwise be duplicating its contents in every cgroup under 
> > > it.  Probably not very helpful for child cgroup to see stat = 509 ASIDs 
> > > but max = 100 :)
> > 
> > Yes, only at root.
> 
> Is a root level stat file needed?  Can't the infrastructure do .max - .current
> on the root cgroup to calculate the number of available ids in the system?

For an efficient scheduling of workloads in the cloud infrastructure, a
scheduler needs to know the total capacity supported and the current
usage of the host to get the overall picture. There are some issues with
.max -.current approach:

1. Cgroup v2 convention is to not put resource control files in the
   root. This will mean we need to sum (.max -.current) in all of the
   immediate children of the root.

2. .max can have any limit unless we add a check to not allow a user to
   set any value more than the supported one. This will theoretically
   change the encryption_ids cgroup resource distribution model from the
   limit based to the allocation based. It will require the same
   validations in the children cgroups. I think providing separate file on
   the root is a simpler approach.

For someone to set the max limit, they need to know what is the
supported capacity. In the case of SEV and SEV-ES it is not shown
anywhere and the only way to know this is to use a CPUID instructions.
The "stat" file will provide an easy way to know it.

Since current approach is not migrating charges, this means when a
process migrates to an another cgroup and the old cgroup is deleted
(user won't see it but it will be present in the cgroup hierarchy
internally), we cannot get the correct usage by going through other
cgroup directories in this case.

I am suggesting that the root stat file should show both available and
used information.
$ cat encryption_ids.sev.stat
total 509
used 102

It will be very easy for a cloud scheduler to retrieve the system state
quickly.
Sean Christopherson Nov. 24, 2020, 11:18 p.m. UTC | #19
On Tue, Nov 24, 2020, Vipin Sharma wrote:
> On Tue, Nov 24, 2020 at 09:27:25PM +0000, Sean Christopherson wrote:
> > Is a root level stat file needed?  Can't the infrastructure do .max - .current
> > on the root cgroup to calculate the number of available ids in the system?
> 
> For an efficient scheduling of workloads in the cloud infrastructure, a
> scheduler needs to know the total capacity supported and the current
> usage of the host to get the overall picture. There are some issues with
> .max -.current approach:
> 
> 1. Cgroup v2 convention is to not put resource control files in the
>    root. This will mean we need to sum (.max -.current) in all of the
>    immediate children of the root.

Ah, that's annoying.  Now that you mention it, I do vaguely recall this behavior.
 
> 2. .max can have any limit unless we add a check to not allow a user to
>    set any value more than the supported one. 

Duh, didn't think that one through.

Thanks!
Christian Borntraeger Nov. 27, 2020, 6:01 p.m. UTC | #20
On 24.11.20 20:16, Sean Christopherson wrote:
> On Fri, Nov 13, 2020, David Rientjes wrote:                                     
>>                                                                               
>> On Mon, 2 Nov 2020, Sean Christopherson wrote:                                
>>                                                                               
>>> On Fri, Oct 02, 2020 at 01:48:10PM -0700, Vipin Sharma wrote:               
>>>> On Fri, Sep 25, 2020 at 03:22:20PM -0700, Vipin Sharma wrote:             
>>>>> I agree with you that the abstract name is better than the concrete     
>>>>> name, I also feel that we must provide HW extensions. Here is one       
>>>>> approach:                                                               
>>>>>                                                                         
>>>>> Cgroup name: cpu_encryption, encryption_slots, or memcrypt (open to     
>>>>> suggestions)                                                            
>>>>>                                                                         
>>>>> Control files: slots.{max, current, events}                             
>>>                                                                             
>>> I don't particularly like the "slots" name, mostly because it could be confused
>>> with KVM's memslots.  Maybe encryption_ids.ids.{max, current, events}?  I don't
>>> love those names either, but "encryption" and "IDs" are the two obvious     
>>> commonalities betwee TDX's encryption key IDs and SEV's encryption address  
>>> space IDs.                                                                  
>>>                                                                             
>>                                                                               
>> Looping Janosch and Christian back into the thread.                           
>>                                                                               
>> I interpret this suggestion as                                                
>> encryption.{sev,sev_es,keyids}.{max,current,events} for AMD and Intel         
> 
> I think it makes sense to use encryption_ids instead of simply encryption, that
> way it's clear the cgroup is accounting ids as opposed to restricting what
> techs can be used on yes/no basis.

For what its worth the IDs for s390x are called SEIDs (secure execution IDs)

> 
>> offerings, which was my thought on this as well.                              
>>                                                                               
>> Certainly the kernel could provide a single interface for all of these and    
>> key value pairs depending on the underlying encryption technology but it      
>> seems to only introduce additional complexity in the kernel in string         
>> parsing that can otherwise be avoided.  I think we all agree that a single    
>> interface for all encryption keys or one-value-per-file could be done in      
>> the kernel and handled by any userspace agent that is configuring these       
>> values.                                                                       
>>                                                                               
>> I think Vipin is adding a root level file that describes how many keys we     
>> have available on the platform for each technology.  So I think this comes    
>> down to, for example, a single encryption.max file vs                         
>> encryption.{sev,sev_es,keyid}.max.  SEV and SEV-ES ASIDs are provisioned      
> 
> Are you suggesting that the cgroup omit "current" and "events"?  I agree there's
> no need to enumerate platform total, but not knowing how many of the allowed IDs
> have been allocated seems problematic.
> 
>> separately so we treat them as their own resource here.                       
>>                                                                               
>> So which is easier?                                                           
>>                                                                               
>> $ cat encryption.sev.max                                                      
>> 10                                                                            
>> $ echo -n 15 > encryption.sev.max                                             
>>                                                                               
>> or                                                                            
>>                                                                               
>> $ cat encryption.max                                                          
>> sev 10                                                                        
>> sev_es 10                                                                     
>> keyid 0                                                                       
>> $ echo -n "sev 10" > encryption.max                                           
>>                                                                               
>> I would argue the former is simplest (always preferring                       
>> one-value-per-file) and avoids any string parsing or resource controller      
>> lookups that need to match on that string in the kernel.                      

I like the idea of having encryption_ids.max for all platforms. 
If we go for individual files using "seid" for s390 seems the best name.

> 
> Ya, I prefer individual files as well.
> 
> I don't think "keyid" is the best name for TDX, it doesn't leave any wiggle room
> if there are other flavors of key IDs on Intel platform, e.g. private vs. shared
> in the future.  It's also inconsistent with the SEV names, e.g. "asid" isn't
> mentioned anywhere.  And "keyid" sort of reads as "max key id", rather than "max
> number of keyids".  Maybe "tdx_private", or simply "tdx"?  Doesn't have to be
> solved now though, there's plenty of time before TDX will be upstream. :-)
> 
>> The set of encryption.{sev,sev_es,keyid} files that exist would depend on     
>> CONFIG_CGROUP_ENCRYPTION and whether CONFIG_AMD_MEM_ENCRYPT or                
>> CONFIG_INTEL_TDX is configured.  Both can be configured so we have all        
>> three files, but the root file will obviously indicate 0 keys available       
>> for one of them (can't run on AMD and Intel at the same time :).              
>>                                                                               
>> So I'm inclined to suggest that the one-value-per-file format is the ideal    
>> way to go unless there are objections to it.