Message ID | 20210304231946.2766648-1-vipinsh@google.com (mailing list archive) |
---|---|
Headers | show |
Series | cgroup: New misc cgroup controller | expand |
Hello, On Thu, Mar 04, 2021 at 03:19:44PM -0800, Vipin Sharma wrote: > This patch series is creating a new misc cgroup controller for limiting > and tracking of resources which are not abstract like other cgroup > controllers. > > This controller was initially proposed as encryption_id but after > the feedbacks, it is now changed to misc cgroup. > https://lore.kernel.org/lkml/20210108012846.4134815-2-vipinsh@google.com/ Vipin, thank you very much for your persistence and patience. The patchset looks good to me. Michal, as you've been reviewing the series, can you please take another look and ack them if you don't find anything objectionable?
Hello. On Sun, Mar 07, 2021 at 07:48:40AM -0500, Tejun Heo <tj@kernel.org> wrote: > Vipin, thank you very much for your persistence and patience. Yes, and thanks for taking my remarks into account. > Michal, as you've been reviewing the series, can you please take > another look and ack them if you don't find anything objectionable? Honestly, I'm still sitting on the fence whether this needs a new controller and whether the miscontroller (:-p) is a good approach in the long term [1]. I admit, I didn't follow the past dicussions completely, however, (Vipin) could it be in the cover letter/commit messages shortly summarized why cgroups and a controller were chosen to implement restrictions of these resources, what were the alternatives any why were they rejected? In the previous discussion, I saw the reasoning for the list of the resources to be hardwired in the controller itself in order to get some scrutiny of possible changes. That makes sense to me. But with that, is it necessary to commit to the new controller API via EXPORT_SYMBOL? (I don't mean this as a licensing question but what the external API should be (if any).) Besides the generic remarks above, I'd still suggest some slight implementation changes, posted inline to the patch. Thanks, Michal [1] Currently, only one thing comes to my mind -- the delegation via cgroup.subtree_control. The miscontroller may add possibly further resources whose delegation granularity is bunched up under one entry.
Hello, On Thu, Mar 11, 2021 at 07:58:19PM +0100, Michal Koutný wrote: > > Michal, as you've been reviewing the series, can you please take > > another look and ack them if you don't find anything objectionable? > Honestly, I'm still sitting on the fence whether this needs a new > controller and whether the miscontroller (:-p) is a good approach in the > long term [1]. Yeah, it's a bit of cop-out. My take is that the underlying hardware feature isn't mature enough to have reasonable abstraction built on top of them. Given time, maybe future iterations will get there or maybe it's a passing fad and people will mostly forget about these. In the meantime, keeping them out of cgroup is one direction, a relatively high friction one but still viable. Or we can provide something of a halfway house so that people who have immediate needs can still leverage the existing infrastructure while controlling the amount of time, energy and future lock-ins they take. So, that's misc controller. I'm somewhat ambivalent but we've had multiple of these things popping up in the past several years and containment seems to be a reasonable approach at this point. > [1] Currently, only one thing comes to my mind -- the delegation via > cgroup.subtree_control. The miscontroller may add possibly further > resources whose delegation granularity is bunched up under one entry. Controller enabling and delegation in themselves aren't supposed to have resource or security implications, so I don't think it's a practical problem. Thanks.
On Thu, Mar 11, 2021 at 07:58:19PM +0100, Michal Koutný wrote: > I admit, I didn't follow the past dicussions completely, however, > (Vipin) could it be in the cover letter/commit messages shortly > summarized why cgroups and a controller were chosen to implement > restrictions of these resources, what were the alternatives any why were > they rejected? I will add some more information in the cover letter of the next version. Basically, SEV will mostly be used by cloud providers for providing confidential VMs. Since they are limited we need a good way to schedule these jobs in cloud infrastructure. To achieve this we either come up with some ioctl for "/dev/sev" to know about its usage, availability, etc. This requires existing scheduling mechanism in the cloud to have an extension for this interaction. Now same thing needs to be done for TDX. IBM SEID doesn't have scarcity of this resource but they are also interested in tracking and limiting the usage. Each one coming up with their own interaction is a duplicate effort when they all need similar thing. One can say that abstraction should be at KVM level but these resources can be used outside VM as well. Most of the cloud infrastructure use cgroups for knowing the host state, track the resources usage, enforce limits on them, etc. They use this info to optimize work allocation in the fleet and make sure no rogue job consumes more than it needs and starves other. Adding these resources to cgroup is a natural choice with least friction. Cgroup itself says it is a mechanism to distribute system resources along the hierarchy in a controlled mechanism and configurable manner. Most of the resources in cgroups are abstracted enough but their are still resources which are not abstract but have limited availability or have specific use cases. > > In the previous discussion, I saw the reasoning for the list of the > resources to be hardwired in the controller itself in order to get some > scrutiny of possible changes. That makes sense to me. But with that, is > it necessary to commit to the new controller API via EXPORT_SYMBOL? (I > don't mean this as a licensing question but what the external API should > be (if any).) As per my understanding this is the only for way for loadable modules (kvm-amd in this case) to access Kernel APIs. Let me know if there is a better way to do it. > > Besides the generic remarks above, I'd still suggest some slight > implementation changes, posted inline to the patch. I will work on them. I appreciate you guys taking out time and helping me out with this patch series. Thanks Vipin
On Fri, Mar 12, 2021 at 09:49:26AM -0800, Vipin Sharma <vipinsh@google.com> wrote: > I will add some more information in the cover letter of the next version. Thanks. > Each one coming up with their own interaction is a duplicate effort > when they all need similar thing. Could this be expressed as a new BPF hook (when allocating/freeing such a resource unit)? The decision could be made based on the configured limit or even some other predicate. (I saw this proposed already but I haven't seen some more reasoning whether it's worse/better. IMO, BPF hooks are "cheaper" than full-blown controllers, though it's still new user API.) > As per my understanding this is the only for way for loadable modules > (kvm-amd in this case) to access Kernel APIs. Let me know if there is a > better way to do it. I understood the symbols are exported for such modularized builds. However, making them non-GPL exposes them to any out-of-tree modules, although, the resource types are supposed to stay hardcoded in the misc controller. So my point was to make them EXPORT_SYMBOL_GPL to mark they're just a means of implementing the modularized builds and not an API. (But they'd remain API for out-of-tree GPL modules anyway, so take this reasoning of mine with a grain of salt.) Michal
On Mon, Mar 15, 2021 at 08:10:09PM +0100, Michal Koutný wrote: > On Fri, Mar 12, 2021 at 09:49:26AM -0800, Vipin Sharma <vipinsh@google.com> wrote: > > I will add some more information in the cover letter of the next version. > Thanks. > > > Each one coming up with their own interaction is a duplicate effort > > when they all need similar thing. > Could this be expressed as a new BPF hook (when allocating/freeing such > a resource unit)? > > The decision could be made based on the configured limit or even some > other predicate. > > (I saw this proposed already but I haven't seen some more reasoning > whether it's worse/better. IMO, BPF hooks are "cheaper" than full-blown > controllers, though it's still new user API.) I am not much knowledgeable with BPF, so, I might be wrong here. There are couple of things which might not be addressed with BPF: 1. Which controller to use in v1 case? These are not abstract resources so in v1 where each controller have their own hierarchy it might not be easy to identify the best controller. 2. It seems to me that we won't be able to abstract out a single BPF program which can help with all of the resources types we are planning to use, again, because it is not an abstract type like network packets, and there will be different places in the source code to use these resources. To me a cgroup tends to give much easier and well integrated solution when it comes to scheduling and limiting a resource with existing tools in a cloud infrastructure. > > > > As per my understanding this is the only for way for loadable modules > > (kvm-amd in this case) to access Kernel APIs. Let me know if there is a > > better way to do it. > I understood the symbols are exported for such modularized builds. > However, making them non-GPL exposes them to any out-of-tree modules, > although, the resource types are supposed to stay hardcoded in the misc > controller. So my point was to make them EXPORT_SYMBOL_GPL to mark > they're just a means of implementing the modularized builds and not an > API. (But they'd remain API for out-of-tree GPL modules anyway, so take > this reasoning of mine with a grain of salt.) > I see, I will change it to GPL. Thanks Vipin