Message ID | 20230322142525.162469-1-flosch@nutanix.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] memcg v1: provide read access to memory.pressure_level | expand |
On Wed 22-03-23 14:25:25, Florian Schmidt wrote: > cgroups v1 has a unique way of setting up memory pressure notifications: > the user opens "memory.pressure_level" of the cgroup they want to > monitor for pressure, then open "cgroup.event_control" and write the fd > (among other things) to that file. memory.pressure_level has no other > use, specifically it does not support any read or write operations. > Consequently, no handlers are provided, and the file ends up with > permissions 000. However, to actually use the mechanism, the subscribing > user must have read access to the file and open the fd for reading, see > memcg_write_event_control(). > > This is all fine as long as the subscribing process runs as root and is > otherwise unconfined by further restrictions. However, if you add strict > access controls such as selinux, the permission bits will be enforced, > and opening memory.pressure_level for reading will fail, preventing the > process from subscribing, even as root. > > > There are several ways around this issue, but adding a dummy read > handler seems like the least invasive to me. I was struggling to see how that addresses the problem because all you need is a read permission. But then I've looked into cgroup code and learned that permissions are constructed based on available callbacks (cgroup_file_mode). This would have made the review easier ;) I have no issue with the patch. It would be great to hear from cgroup maintainers whether a concept of default permissions is something that would be useful also for other files. > I'd be interested to hear: > (a) do you think there is a less invasive way? Alternatively, we could > add a flag in cftype in include/linux/cgroup-defs.h, but that seems > more invasive for what is a legacy interface. > (b) would you be interested to take this patch, or is it too niche a fix > for a legacy subsystem? After you add your s-o-b, feel free to add Acked-by: Michal Hocko <mhocko@suse.com> If cgroup people find a concept of default permissions for a cgroup file sound then this could be replaced by that approach but this is really an easy workaround. > --- > mm/memcontrol.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 5abffe6f8389..e48c749d9724 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -3734,6 +3734,16 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, > } > } > > +/* > + * This function doesn't do anything useful. Its only job is to provide a read > + * handler so that the file gets read permissions when it's created. I would just reference cgroup_file_mode() in the comment to make our lifes easier and comment more helpful. > + */ > +static int mem_cgroup_dummy_seq_show(__always_unused struct seq_file *m, > + __always_unused void *v) > +{ > + return -EINVAL; > +} > + > #ifdef CONFIG_MEMCG_KMEM > static int memcg_online_kmem(struct mem_cgroup *memcg) > { > @@ -5064,6 +5074,7 @@ static struct cftype mem_cgroup_legacy_files[] = { > }, > { > .name = "pressure_level", > + .seq_show = mem_cgroup_dummy_seq_show, > }, > #ifdef CONFIG_NUMA > { > -- > 2.32.0
On 22/03/2023 15:57, Michal Hocko wrote: > On Wed 22-03-23 14:25:25, Florian Schmidt wrote: >> cgroups v1 has a unique way of setting up memory pressure notifications: >> the user opens "memory.pressure_level" of the cgroup they want to >> monitor for pressure, then open "cgroup.event_control" and write the fd >> (among other things) to that file. memory.pressure_level has no other >> use, specifically it does not support any read or write operations. >> Consequently, no handlers are provided, and the file ends up with >> permissions 000. However, to actually use the mechanism, the subscribing >> user must have read access to the file and open the fd for reading, see >> memcg_write_event_control(). >> >> This is all fine as long as the subscribing process runs as root and is >> otherwise unconfined by further restrictions. However, if you add strict >> access controls such as selinux, the permission bits will be enforced, >> and opening memory.pressure_level for reading will fail, preventing the >> process from subscribing, even as root. >> >> >> There are several ways around this issue, but adding a dummy read >> handler seems like the least invasive to me. > > I was struggling to see how that addresses the problem because all you > need is a read permission. But then I've looked into cgroup code and > learned that permissions are constructed based on available callbacks > (cgroup_file_mode). This would have made the review easier ;) Oh, sorry, I forgot to mention that salient detail! I didn't check whether that was a common pattern or not... > > I have no issue with the patch. It would be great to hear from cgroup > maintainers whether a concept of default permissions is something that > would be useful also for other files. > >> I'd be interested to hear: >> (a) do you think there is a less invasive way? Alternatively, we could >> add a flag in cftype in include/linux/cgroup-defs.h, but that seems >> more invasive for what is a legacy interface. >> (b) would you be interested to take this patch, or is it too niche a fix >> for a legacy subsystem? > > After you add your s-o-b, feel free to add > Acked-by: Michal Hocko <mhocko@suse.com> > > If cgroup people find a concept of default permissions for a cgroup file > sound then this could be replaced by that approach but this is really an > easy workaround. Will do, once I know the path forward and construct a proper commit message, I'll add the s-o-b and ack. >> --- >> mm/memcontrol.c | 11 +++++++++++ >> 1 file changed, 11 insertions(+) >> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 5abffe6f8389..e48c749d9724 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -3734,6 +3734,16 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, >> } >> } >> >> +/* >> + * This function doesn't do anything useful. Its only job is to provide a read >> + * handler so that the file gets read permissions when it's created. > > I would just reference cgroup_file_mode() in the comment to make our > lifes easier and comment more helpful. Ack. > >> + */ >> +static int mem_cgroup_dummy_seq_show(__always_unused struct seq_file *m, >> + __always_unused void *v) >> +{ >> + return -EINVAL; >> +} >> + >> #ifdef CONFIG_MEMCG_KMEM >> static int memcg_online_kmem(struct mem_cgroup *memcg) >> { >> @@ -5064,6 +5074,7 @@ static struct cftype mem_cgroup_legacy_files[] = { >> }, >> { >> .name = "pressure_level", >> + .seq_show = mem_cgroup_dummy_seq_show, >> }, >> #ifdef CONFIG_NUMA >> { >> -- >> 2.32.0 >
Hello. On Wed, Mar 22, 2023 at 02:25:25PM +0000, Florian Schmidt <flosch@nutanix.com> wrote: > cgroups v1 has a unique way of setting up memory pressure notifications: ... > There are several ways around this issue, but adding a dummy read > handler seems like the least invasive to me. I'd be interested to hear: > (a) do you think there is a less invasive way? Alternatively, we could > add a flag in cftype in include/linux/cgroup-defs.h, but that seems > more invasive for what is a legacy interface. You can (as privileged user) modify file perms in userspace first (e.g. chmod o+r memory.pressure_level) and then it can used by non-privileged users. (Or do LSM prevent you from that too?) > (b) would you be interested to take this patch, or is it too niche a fix > for a legacy subsystem? I'd rather not extend this "unique way" with additionally unique dummy helpers. My 0.02 €, Michal
Hi Michal, On 24/03/2023 15:03, Michal Koutný wrote: > On Wed, Mar 22, 2023 at 02:25:25PM +0000, Florian Schmidt <flosch@nutanix.com> wrote: >> cgroups v1 has a unique way of setting up memory pressure notifications: > ... >> There are several ways around this issue, but adding a dummy read >> handler seems like the least invasive to me. I'd be interested to hear: >> (a) do you think there is a less invasive way? Alternatively, we could >> add a flag in cftype in include/linux/cgroup-defs.h, but that seems >> more invasive for what is a legacy interface. > > You can (as privileged user) modify file perms in userspace first (e.g. > chmod o+r memory.pressure_level) and then it can used by non-privileged > users. (Or do LSM prevent you from that too?) That's true, we can work around this in userspace (though it means you need to give the process additional permissions, to change file permissions on top of just reading and writing). Though considering that the memcg_write_event_control() explicitly checks whether the caller has read permissions on pressure_level, it felt sensible to me that the file would be created with read permissions in the first place, just like all the other files are created with permissions that are suitable for their immediate use without having to manually change permissions. The current implementation feels inconsistent in that way. >> (b) would you be interested to take this patch, or is it too niche a fix >> for a legacy subsystem? > > I'd rather not extend this "unique way" with additionally unique dummy > helpers. I understand that this is all code that has no modern user any more, which is why I tried to keep the fix as self-contained as possible. Another option would be to have a special handler in cgroup_file_mode(), but that feels a lot klunkier to me, and leaks a v1-specific behaviour into the shared cgroup code. Cheers, Florian
On Mon 27-03-23 14:59:37, Florian Schmidt wrote: > Hi Michal, > > On 24/03/2023 15:03, Michal Koutný wrote: > > On Wed, Mar 22, 2023 at 02:25:25PM +0000, Florian Schmidt <flosch@nutanix.com> wrote: [...] > > > (b) would you be interested to take this patch, or is it too niche a fix > > > for a legacy subsystem? > > > > I'd rather not extend this "unique way" with additionally unique dummy > > helpers. > > I understand that this is all code that has no modern user any more, which > is why I tried to keep the fix as self-contained as possible. > Another option would be to have a special handler in cgroup_file_mode(), but > that feels a lot klunkier to me, and leaks a v1-specific behaviour into the > shared cgroup code. Yes, this is effectivelly a deprecated interface but I do agree that we shouldn't really make life of users more complicated than necessary. If the simplest solution to address this is to provide an empty callback then be it. I am not sure but I do not think there are other cgroup interfaces to warrant a more generic solution.
Hi all, to summarise, I've heard generally positive feedback from Michal H and some more reserved, but not fundamentally opposed feedback from Michal K. Thanks to both of you. Since there's been no other feedback for the last few days, I'll raise a proper patch, and any potential further discussion can then be done on that. Cheers, Florian
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 5abffe6f8389..e48c749d9724 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3734,6 +3734,16 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, } } +/* + * This function doesn't do anything useful. Its only job is to provide a read + * handler so that the file gets read permissions when it's created. + */ +static int mem_cgroup_dummy_seq_show(__always_unused struct seq_file *m, + __always_unused void *v) +{ + return -EINVAL; +} + #ifdef CONFIG_MEMCG_KMEM static int memcg_online_kmem(struct mem_cgroup *memcg) { @@ -5064,6 +5074,7 @@ static struct cftype mem_cgroup_legacy_files[] = { }, { .name = "pressure_level", + .seq_show = mem_cgroup_dummy_seq_show, }, #ifdef CONFIG_NUMA {