diff mbox series

[RFC] memcg v1: provide read access to memory.pressure_level

Message ID 20230322142525.162469-1-flosch@nutanix.com (mailing list archive)
State New
Headers show
Series [RFC] memcg v1: provide read access to memory.pressure_level | expand

Commit Message

Florian Schmidt March 22, 2023, 2:25 p.m. UTC
cgroups v1 has a unique way of setting up memory pressure notifications:
the user opens "memory.pressure_level" of the cgroup they want to
monitor for pressure, then open "cgroup.event_control" and write the fd
(among other things) to that file. memory.pressure_level has no other
use, specifically it does not support any read or write operations.
Consequently, no handlers are provided, and the file ends up with
permissions 000. However, to actually use the mechanism, the subscribing
user must have read access to the file and open the fd for reading, see
memcg_write_event_control().

This is all fine as long as the subscribing process runs as root and is
otherwise unconfined by further restrictions. However, if you add strict
access controls such as selinux, the permission bits will be enforced,
and opening memory.pressure_level for reading will fail, preventing the
process from subscribing, even as root.

There are several ways around this issue, but adding a dummy read
handler seems like the least invasive to me. I'd be interested to hear:
(a) do you think there is a less invasive way? Alternatively, we could
    add a flag in cftype in include/linux/cgroup-defs.h, but that seems
    more invasive for what is a legacy interface.
(b) would you be interested to take this patch, or is it too niche a fix
    for a legacy subsystem?
---
 mm/memcontrol.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Michal Hocko March 22, 2023, 3:57 p.m. UTC | #1
On Wed 22-03-23 14:25:25, Florian Schmidt wrote:
> cgroups v1 has a unique way of setting up memory pressure notifications:
> the user opens "memory.pressure_level" of the cgroup they want to
> monitor for pressure, then open "cgroup.event_control" and write the fd
> (among other things) to that file. memory.pressure_level has no other
> use, specifically it does not support any read or write operations.
> Consequently, no handlers are provided, and the file ends up with
> permissions 000. However, to actually use the mechanism, the subscribing
> user must have read access to the file and open the fd for reading, see
> memcg_write_event_control().
> 
> This is all fine as long as the subscribing process runs as root and is
> otherwise unconfined by further restrictions. However, if you add strict
> access controls such as selinux, the permission bits will be enforced,
> and opening memory.pressure_level for reading will fail, preventing the
> process from subscribing, even as root.
>
> 
> There are several ways around this issue, but adding a dummy read
> handler seems like the least invasive to me.

I was struggling to see how that addresses the problem because all you
need is a read permission. But then I've looked into cgroup code and
learned that permissions are constructed based on available callbacks
(cgroup_file_mode). This would have made the review easier ;)

I have no issue with the patch. It would be great to hear from cgroup
maintainers whether a concept of default permissions is something that
would be useful also for other files.

> I'd be interested to hear:
> (a) do you think there is a less invasive way? Alternatively, we could
>     add a flag in cftype in include/linux/cgroup-defs.h, but that seems
>     more invasive for what is a legacy interface.
> (b) would you be interested to take this patch, or is it too niche a fix
>     for a legacy subsystem?

After you add your s-o-b, feel free to add
Acked-by: Michal Hocko <mhocko@suse.com>

If cgroup people find a concept of default permissions for a cgroup file
sound then this could be replaced by that approach but this is really an
easy workaround.
> ---
>  mm/memcontrol.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 5abffe6f8389..e48c749d9724 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3734,6 +3734,16 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css,
>  	}
>  }
>  
> +/*
> + * This function doesn't do anything useful. Its only job is to provide a read
> + * handler so that the file gets read permissions when it's created.

I would just reference cgroup_file_mode() in the comment to make our
lifes easier and comment more helpful.

> + */
> +static int mem_cgroup_dummy_seq_show(__always_unused struct seq_file *m,
> +				     __always_unused void *v)
> +{
> +	return -EINVAL;
> +}
> +
>  #ifdef CONFIG_MEMCG_KMEM
>  static int memcg_online_kmem(struct mem_cgroup *memcg)
>  {
> @@ -5064,6 +5074,7 @@ static struct cftype mem_cgroup_legacy_files[] = {
>  	},
>  	{
>  		.name = "pressure_level",
> +		.seq_show = mem_cgroup_dummy_seq_show,
>  	},
>  #ifdef CONFIG_NUMA
>  	{
> -- 
> 2.32.0
Florian Schmidt March 22, 2023, 4 p.m. UTC | #2
On 22/03/2023 15:57, Michal Hocko wrote:
> On Wed 22-03-23 14:25:25, Florian Schmidt wrote:
>> cgroups v1 has a unique way of setting up memory pressure notifications:
>> the user opens "memory.pressure_level" of the cgroup they want to
>> monitor for pressure, then open "cgroup.event_control" and write the fd
>> (among other things) to that file. memory.pressure_level has no other
>> use, specifically it does not support any read or write operations.
>> Consequently, no handlers are provided, and the file ends up with
>> permissions 000. However, to actually use the mechanism, the subscribing
>> user must have read access to the file and open the fd for reading, see
>> memcg_write_event_control().
>>
>> This is all fine as long as the subscribing process runs as root and is
>> otherwise unconfined by further restrictions. However, if you add strict
>> access controls such as selinux, the permission bits will be enforced,
>> and opening memory.pressure_level for reading will fail, preventing the
>> process from subscribing, even as root.
>>
>>
>> There are several ways around this issue, but adding a dummy read
>> handler seems like the least invasive to me.
> 
> I was struggling to see how that addresses the problem because all you
> need is a read permission. But then I've looked into cgroup code and
> learned that permissions are constructed based on available callbacks
> (cgroup_file_mode). This would have made the review easier ;)

Oh, sorry, I forgot to mention that salient detail!
I didn't check whether that was a common pattern or not...


> 
> I have no issue with the patch. It would be great to hear from cgroup
> maintainers whether a concept of default permissions is something that
> would be useful also for other files.
> 
>> I'd be interested to hear:
>> (a) do you think there is a less invasive way? Alternatively, we could
>>      add a flag in cftype in include/linux/cgroup-defs.h, but that seems
>>      more invasive for what is a legacy interface.
>> (b) would you be interested to take this patch, or is it too niche a fix
>>      for a legacy subsystem?
> 
> After you add your s-o-b, feel free to add
> Acked-by: Michal Hocko <mhocko@suse.com>
> 
> If cgroup people find a concept of default permissions for a cgroup file
> sound then this could be replaced by that approach but this is really an
> easy workaround.

Will do, once I know the path forward and construct a proper commit 
message, I'll add the s-o-b and ack.

>> ---
>>   mm/memcontrol.c | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 5abffe6f8389..e48c749d9724 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -3734,6 +3734,16 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css,
>>   	}
>>   }
>>   
>> +/*
>> + * This function doesn't do anything useful. Its only job is to provide a read
>> + * handler so that the file gets read permissions when it's created.
> 
> I would just reference cgroup_file_mode() in the comment to make our
> lifes easier and comment more helpful.

Ack.


> 
>> + */
>> +static int mem_cgroup_dummy_seq_show(__always_unused struct seq_file *m,
>> +				     __always_unused void *v)
>> +{
>> +	return -EINVAL;
>> +}
>> +
>>   #ifdef CONFIG_MEMCG_KMEM
>>   static int memcg_online_kmem(struct mem_cgroup *memcg)
>>   {
>> @@ -5064,6 +5074,7 @@ static struct cftype mem_cgroup_legacy_files[] = {
>>   	},
>>   	{
>>   		.name = "pressure_level",
>> +		.seq_show = mem_cgroup_dummy_seq_show,
>>   	},
>>   #ifdef CONFIG_NUMA
>>   	{
>> -- 
>> 2.32.0
>
Michal Koutný March 24, 2023, 3:03 p.m. UTC | #3
Hello.

On Wed, Mar 22, 2023 at 02:25:25PM +0000, Florian Schmidt <flosch@nutanix.com> wrote:
> cgroups v1 has a unique way of setting up memory pressure notifications:
...
> There are several ways around this issue, but adding a dummy read
> handler seems like the least invasive to me. I'd be interested to hear:
> (a) do you think there is a less invasive way? Alternatively, we could
>     add a flag in cftype in include/linux/cgroup-defs.h, but that seems
>     more invasive for what is a legacy interface.

You can (as privileged user) modify file perms in userspace first (e.g.
chmod o+r memory.pressure_level) and then it can used by non-privileged
users. (Or do LSM prevent you from that too?)

> (b) would you be interested to take this patch, or is it too niche a fix
>     for a legacy subsystem?

I'd rather not extend this "unique way" with additionally unique dummy
helpers.

My 0.02 €,
Michal
Florian Schmidt March 27, 2023, 1:59 p.m. UTC | #4
Hi Michal,

On 24/03/2023 15:03, Michal Koutný wrote:
> On Wed, Mar 22, 2023 at 02:25:25PM +0000, Florian Schmidt <flosch@nutanix.com> wrote:
>> cgroups v1 has a unique way of setting up memory pressure notifications:
> ...
>> There are several ways around this issue, but adding a dummy read
>> handler seems like the least invasive to me. I'd be interested to hear:
>> (a) do you think there is a less invasive way? Alternatively, we could
>>      add a flag in cftype in include/linux/cgroup-defs.h, but that seems
>>      more invasive for what is a legacy interface.
> 
> You can (as privileged user) modify file perms in userspace first (e.g.
> chmod o+r memory.pressure_level) and then it can used by non-privileged
> users. (Or do LSM prevent you from that too?)

That's true, we can work around this in userspace (though it means you 
need to give the process additional permissions, to change file 
permissions on top of just reading and writing).

Though considering that the memcg_write_event_control() explicitly 
checks whether the caller has read permissions on pressure_level, it 
felt sensible to me that the file would be created with read permissions 
in the first place, just like all the other files are created with 
permissions that are suitable for their immediate use without having to 
manually change permissions. The current implementation feels 
inconsistent in that way.


>> (b) would you be interested to take this patch, or is it too niche a fix
>>      for a legacy subsystem?
> 
> I'd rather not extend this "unique way" with additionally unique dummy
> helpers.

I understand that this is all code that has no modern user any more, 
which is why I tried to keep the fix as self-contained as possible.
Another option would be to have a special handler in cgroup_file_mode(), 
but that feels a lot klunkier to me, and leaks a v1-specific behaviour 
into the shared cgroup code.


Cheers,
Florian
Michal Hocko March 27, 2023, 8:40 p.m. UTC | #5
On Mon 27-03-23 14:59:37, Florian Schmidt wrote:
> Hi Michal,
> 
> On 24/03/2023 15:03, Michal Koutný wrote:
> > On Wed, Mar 22, 2023 at 02:25:25PM +0000, Florian Schmidt <flosch@nutanix.com> wrote:
[...]
> > > (b) would you be interested to take this patch, or is it too niche a fix
> > >      for a legacy subsystem?
> > 
> > I'd rather not extend this "unique way" with additionally unique dummy
> > helpers.
> 
> I understand that this is all code that has no modern user any more, which
> is why I tried to keep the fix as self-contained as possible.
> Another option would be to have a special handler in cgroup_file_mode(), but
> that feels a lot klunkier to me, and leaks a v1-specific behaviour into the
> shared cgroup code.

Yes, this is effectivelly a deprecated interface but I do agree that we
shouldn't really make life of users more complicated than necessary. If
the simplest solution to address this is to provide an empty callback
then be it. I am not sure but I do not think there are other cgroup
interfaces to warrant a more generic solution.
Florian Schmidt April 4, 2023, 8:44 a.m. UTC | #6
Hi all,

to summarise, I've heard generally positive feedback from Michal H and 
some more reserved, but not fundamentally opposed feedback from Michal 
K. Thanks to both of you.

Since there's been no other feedback for the last few days, I'll raise a 
proper patch, and any potential further discussion can then be done on that.

Cheers,
Florian
diff mbox series

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5abffe6f8389..e48c749d9724 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3734,6 +3734,16 @@  static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css,
 	}
 }
 
+/*
+ * This function doesn't do anything useful. Its only job is to provide a read
+ * handler so that the file gets read permissions when it's created.
+ */
+static int mem_cgroup_dummy_seq_show(__always_unused struct seq_file *m,
+				     __always_unused void *v)
+{
+	return -EINVAL;
+}
+
 #ifdef CONFIG_MEMCG_KMEM
 static int memcg_online_kmem(struct mem_cgroup *memcg)
 {
@@ -5064,6 +5074,7 @@  static struct cftype mem_cgroup_legacy_files[] = {
 	},
 	{
 		.name = "pressure_level",
+		.seq_show = mem_cgroup_dummy_seq_show,
 	},
 #ifdef CONFIG_NUMA
 	{