mbox series

[v3,0/2] cgroup: KVM: New Encryption IDs cgroup controller

Message ID 20201209205413.3391139-1-vipinsh@google.com (mailing list archive)
Headers show
Series cgroup: KVM: New Encryption IDs cgroup controller | expand

Message

Vipin Sharma Dec. 9, 2020, 8:54 p.m. UTC
Hello,

This patch adds a new cgroup controller, Encryption IDs, to track and
limit the usage of encryption IDs on a host.

AMD provides Secure Encrypted Virtualization (SEV) and SEV with
Encrypted State (SEV-ES) to encrypt the guest OS's memory using limited
number of Address Space Identifiers (ASIDs).

This limited number of ASIDs creates issues like SEV ASID starvation and
unoptimized scheduling in the cloud infrastucture.

In the RFC patch v1, I provided only SEV cgroup controller but based
on the feedback and discussion it became clear that this cgroup
controller can be extended to be used by Intel's Trusted Domain
Extension (TDX) and s390's protected virtualization Secure Execution IDs
(SEID)

This patch series provides a generic Encryption IDs controller with
tracking support of the SEV ASIDs.

Changes in v3:
- Fixes a build error when CONFIG_CGROUP is disabled.

Changes in v2:
- Changed cgroup name from sev to encryption_ids.
- Replaced SEV specific names in APIs and documentations with generic
  encryption IDs.
- Providing 3 cgroup files per encryption ID type. For example in SEV,
  - encryption_ids.sev.stat (only in the root cgroup directory).
  - encryption_ids.sev.max
  - encryption_ids.sev.current

Thanks
Vipin Sharma

[1] https://lore.kernel.org/lkml/20200922004024.3699923-1-vipinsh@google.com/#r
[2] https://lore.kernel.org/lkml/20201208213531.2626955-1-vipinsh@google.com/

 .../admin-guide/cgroup-v1/encryption_ids.rst  | 108 +++++
 Documentation/admin-guide/cgroup-v2.rst       |  78 +++-
 arch/x86/kvm/svm/sev.c                        |  28 +-
 include/linux/cgroup_subsys.h                 |   4 +
 include/linux/encryption_ids_cgroup.h         |  71 +++
 include/linux/kvm_host.h                      |   4 +
 init/Kconfig                                  |  14 +
 kernel/cgroup/Makefile                        |   1 +
 kernel/cgroup/encryption_ids.c                | 430 ++++++++++++++++++
 9 files changed, 729 insertions(+), 9 deletions(-)
 create mode 100644 Documentation/admin-guide/cgroup-v1/encryption_ids.rst
 create mode 100644 include/linux/encryption_ids_cgroup.h
 create mode 100644 kernel/cgroup/encryption_ids.c

Comments

Tejun Heo Dec. 9, 2020, 8:58 p.m. UTC | #1
Hello,

Rough take after skimming:

* I don't have an overall objection. In terms of behavior, the only thing
  which stood out was input rejection depending on the current usage. The
  preferred way of handling that is rejecting future allocations rather than
  failing configuration as that makes it impossible e.g. to lower limit and
  drain existing usages from outside the container.

* However, the boilerplate to usefulness ratio doesn't look too good and I
  wonder whether what we should do is adding a generic "misc" controller
  which can host this sort of static hierarchical counting. I'll think more
  on it.

Thanks.
Christian Borntraeger Dec. 10, 2020, 2:54 p.m. UTC | #2
On 09.12.20 21:58, Tejun Heo wrote:
> Hello,
> 
> Rough take after skimming:
> 
> * I don't have an overall objection. In terms of behavior, the only thing
>   which stood out was input rejection depending on the current usage. The
>   preferred way of handling that is rejecting future allocations rather than
>   failing configuration as that makes it impossible e.g. to lower limit and
>   drain existing usages from outside the container.
> 
> * However, the boilerplate to usefulness ratio doesn't look too good and I
>   wonder whether what we should do is adding a generic "misc" controller
>   which can host this sort of static hierarchical counting. I'll think more
>   on it.

We first dicussed to have
encryption_ids.stat
encryption_ids.max
encryption_ids.current

and we added the sev in later, so that we can also have tdx, seid, sgx or whatever.
Maybe also 2 or more things at the same time.

Right now this code has

encryption_ids.sev.stat
encryption_ids.sev.max
encryption_ids.sev.current

And it would be trivial to extend it to have
encryption_ids.seid.stat
encryption_ids.seid.max
encryption_ids.seid.current
on s390 instead (for our secure guests).

So in the end this is almost already a misc controller, the only thing that we
need to change is the capability to also define things other than encryption.*.*
And of course we would need to avoid adding lots of random garbage to such a thing.

But if you feel ok with the burden to keep things kind of organized a misc
controller would certainly work for the encryption ID usecase as well. 
So I would be fine with the thing as is or a misc controlĺer.

Christian
David Rientjes Dec. 10, 2020, 11:44 p.m. UTC | #3
On Thu, 10 Dec 2020, Christian Borntraeger wrote:

> > * However, the boilerplate to usefulness ratio doesn't look too good and I
> >   wonder whether what we should do is adding a generic "misc" controller
> >   which can host this sort of static hierarchical counting. I'll think more
> >   on it.
> 
> We first dicussed to have
> encryption_ids.stat
> encryption_ids.max
> encryption_ids.current
> 
> and we added the sev in later, so that we can also have tdx, seid, sgx or whatever.
> Maybe also 2 or more things at the same time.
> 
> Right now this code has
> 
> encryption_ids.sev.stat
> encryption_ids.sev.max
> encryption_ids.sev.current
> 
> And it would be trivial to extend it to have
> encryption_ids.seid.stat
> encryption_ids.seid.max
> encryption_ids.seid.current
> on s390 instead (for our secure guests).
> 
> So in the end this is almost already a misc controller, the only thing that we
> need to change is the capability to also define things other than encryption.*.*
> And of course we would need to avoid adding lots of random garbage to such a thing.
> 
> But if you feel ok with the burden to keep things kind of organized a misc
> controller would certainly work for the encryption ID usecase as well. 
> So I would be fine with the thing as is or a misc controlĺer.
> 

Yeah, I think generalization of this would come in the form of either (1) 
the dumping ground of an actual "misc" controller, that you elude to, or 
(2) a kernel abstraction so you can spin up your own generic controller 
that has the {current, max, stat} support.  In the case of the latter, 
encryption IDs becomes a user of that abstraction.

Concern with a single misc controller would be that any subsystem that 
wants to use it has to exactly fit this support: current, max, stat, 
nothing more.  The moment a controller needs some additional support, and 
its controller is already implemented in previous kernel versionv as a 
part of "misc," we face a problem.

On the other hand, a kernel abstraction that provides just the basic 
{current, max, stat} support might be interesting if it can be extended by 
the subsystem instance using it.
Tejun Heo Dec. 16, 2020, 3:27 p.m. UTC | #4
Hello,

On Thu, Dec 10, 2020 at 03:44:35PM -0800, David Rientjes wrote:
> Concern with a single misc controller would be that any subsystem that 
> wants to use it has to exactly fit this support: current, max, stat, 
> nothing more.  The moment a controller needs some additional support, and 
> its controller is already implemented in previous kernel versionv as a 
> part of "misc," we face a problem.

Yeah, that's a valid concern, but given the history, there doesn't seem to
be much need beyond that for these use cases and the limited need seems
inherent to the way the resources are defined and consumed. So yeah, it can
either way.

Thanks.
Vipin Sharma Dec. 16, 2020, 8:02 p.m. UTC | #5
On Wed, Dec 9, 2020 at 12:59 PM Tejun Heo <tj@kernel.org> wrote:
> * I don't have an overall objection. In terms of behavior, the only thing
>   which stood out was input rejection depending on the current usage. The
>   preferred way of handling that is rejecting future allocations rather than
>   failing configuration as that makes it impossible e.g. to lower limit and
>   drain existing usages from outside the container.

Thanks. In next version of the patch I will remove rejection of max value
based on current usage.

On Wed, Dec 16, 2020 at 7:27 AM Tejun Heo <tj@kernel.org> wrote:
> On Thu, Dec 10, 2020 at 03:44:35PM -0800, David Rientjes wrote:
> > Concern with a single misc controller would be that any subsystem that
> > wants to use it has to exactly fit this support: current, max, stat,
> > nothing more.  The moment a controller needs some additional support, and
> > its controller is already implemented in previous kernel versionv as a
> > part of "misc," we face a problem.
>
> Yeah, that's a valid concern, but given the history, there doesn't seem to
> be much need beyond that for these use cases and the limited need seems
> inherent to the way the resources are defined and consumed. So yeah, it can
> either way.

I think a misc controller should be able support other "Resource Distribution
Models" mentioned in the cgroup v2 documentation besides limits. There might be
use cases in future which want to use weight, protection or allocation models.
If that happens it will be more difficult to support these different resources.
This will also mean the same hierarchy might get charged differently by the
same controller.

I like the idea of having a separate controller to keep the code simple and
easier for maintenance.

If you decide to have a separate misc controller please let me know what will
be the overall expectations and I can change my patch to reflect that,
otherwise I can send out a new patch with just removal of max input rejection.

My understanding with a "misc" controller is it will be something like
- cgroup v2
  cgroup/misc.encryption_ids.{sev, tdx, seid}.{stat, max, current}

- cgroup v1
  cgroup/misc/misc.encryption_ids.{sev, tdx, seid}.{stat, max, current}

Thanks
Vipin
Tejun Heo Jan. 5, 2021, 3:36 p.m. UTC | #6
Happy new year!

On Wed, Dec 16, 2020 at 12:02:37PM -0800, Vipin Sharma wrote:
> I like the idea of having a separate controller to keep the code simple and
> easier for maintenance.

Yeah, the more I think about it, keeping it separate seems like the right
thing to do. What bothers me primarily is that the internal logic is
identical between the RDMA controller and this one. If you wanna try
factoring them out into library, great. If not, I don't think it should
block merging this controller. We can get to refactoring later.

Thanks.
Vipin Sharma Jan. 6, 2021, 6:45 p.m. UTC | #7
On Tue, Jan 05, 2021 at 10:36:40AM -0500, Tejun Heo wrote:
> Happy new year!
> 
> On Wed, Dec 16, 2020 at 12:02:37PM -0800, Vipin Sharma wrote:
> > I like the idea of having a separate controller to keep the code simple and
> > easier for maintenance.
> 
> Yeah, the more I think about it, keeping it separate seems like the right
> thing to do. What bothers me primarily is that the internal logic is
> identical between the RDMA controller and this one. If you wanna try
> factoring them out into library, great. If not, I don't think it should
> block merging this controller. We can get to refactoring later.
> 

Happy new year!
Sounds great, I will send out a new patch which will not reject the new
max limit based on the current usage. It will not include refactoring
out common code between RDMA and Encryption ID controller. We can pursue
that later.

Thanks
Vipin