diff mbox series

[cgroup] cgroup: set the correct return code if hierarchy limits are reached

Message ID d8a9e9c6-856e-1502-95ac-abf9700ff568@openvz.org (mailing list archive)
State New
Headers show
Series [cgroup] cgroup: set the correct return code if hierarchy limits are reached | expand

Commit Message

Vasily Averin June 27, 2022, 2:12 a.m. UTC
When cgroup_mkdir reaches the limits of the cgroup hierarchy, it should
not return -EAGAIN, but instead react similarly to reaching the global
limit.

Signed-off-by: Vasily Averin <vvs@openvz.org>
---
 kernel/cgroup/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Muchun Song June 27, 2022, 3:33 a.m. UTC | #1
On Mon, Jun 27, 2022 at 10:12 AM Vasily Averin <vvs@openvz.org> wrote:
>
> When cgroup_mkdir reaches the limits of the cgroup hierarchy, it should
> not return -EAGAIN, but instead react similarly to reaching the global
> limit.
>
> Signed-off-by: Vasily Averin <vvs@openvz.org>

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

Thanks.
Tejun Heo June 27, 2022, 9:07 a.m. UTC | #2
On Mon, Jun 27, 2022 at 05:12:55AM +0300, Vasily Averin wrote:
> When cgroup_mkdir reaches the limits of the cgroup hierarchy, it should
> not return -EAGAIN, but instead react similarly to reaching the global
> limit.

While I'm not necessarily against this change, I find the rationale to
be somewhat lacking. Can you please elaborate why -ENOSPC is the right
one while -EAGAIN is incorrect?

Thanks.
Roman Gushchin June 28, 2022, 12:44 a.m. UTC | #3
On Mon, Jun 27, 2022 at 05:12:55AM +0300, Vasily Averin wrote:
> When cgroup_mkdir reaches the limits of the cgroup hierarchy, it should
> not return -EAGAIN, but instead react similarly to reaching the global
> limit.
> 
> Signed-off-by: Vasily Averin <vvs@openvz.org>
> ---
>  kernel/cgroup/cgroup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 1be0f81fe8e1..243239553ea3 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -5495,7 +5495,7 @@ int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode)
>  		return -ENODEV;
>  
>  	if (!cgroup_check_hierarchy_limits(parent)) {
> -		ret = -EAGAIN;
> +		ret = -ENOSPC;

I'd not argue whether ENOSPC is better or worse here, but I don't think we need
to change it now. It's been in this state for a long time and is a part of ABI.
EAGAIN is pretty unique as a mkdir() result, so systemd can handle it well.

Thanks!
Vasily Averin June 28, 2022, 3:59 a.m. UTC | #4
On 6/28/22 03:44, Roman Gushchin wrote:
> On Mon, Jun 27, 2022 at 05:12:55AM +0300, Vasily Averin wrote:
>> When cgroup_mkdir reaches the limits of the cgroup hierarchy, it should
>> not return -EAGAIN, but instead react similarly to reaching the global
>> limit.
>>
>> Signed-off-by: Vasily Averin <vvs@openvz.org>
>> ---
>>  kernel/cgroup/cgroup.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
>> index 1be0f81fe8e1..243239553ea3 100644
>> --- a/kernel/cgroup/cgroup.c
>> +++ b/kernel/cgroup/cgroup.c
>> @@ -5495,7 +5495,7 @@ int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode)
>>  		return -ENODEV;
>>  
>>  	if (!cgroup_check_hierarchy_limits(parent)) {
>> -		ret = -EAGAIN;
>> +		ret = -ENOSPC;
> 
> I'd not argue whether ENOSPC is better or worse here, but I don't think we need
> to change it now. It's been in this state for a long time and is a part of ABI.
> EAGAIN is pretty unique as a mkdir() result, so systemd can handle it well.

I would agree with you, however in my opinion EAGAIN is used to restart an
interrupted system call. Thus, I worry its return can loop the user space without
any chance of continuation.

However, maybe I'm confusing something?

Thank you,
	Vasily Averin
Michal Koutný June 28, 2022, 9:16 a.m. UTC | #5
On Tue, Jun 28, 2022 at 06:59:06AM +0300, Vasily Averin <vvs@openvz.org> wrote:
> I would agree with you, however in my opinion EAGAIN is used to restart an
> interrupted system call. Thus, I worry its return can loop the user space without
> any chance of continuation.
> 
> However, maybe I'm confusing something?

The mkdir(2) manpage doesn't list EAGAIN at all. ENOSPC makes better
sense here. (And I suspect the dependency on this particular value won't
be very wide spread.)

0.02€
Michal
Tejun Heo June 28, 2022, 9:22 a.m. UTC | #6
On Tue, Jun 28, 2022 at 11:16:48AM +0200, Michal Koutný wrote:
> The mkdir(2) manpage doesn't list EAGAIN at all. ENOSPC makes better
> sense here. (And I suspect the dependency on this particular value won't
> be very wide spread.)

Given how we use these system calls as triggers for random kernel
operations, I don't think adhering to posix standard is necessary or
possible. Using an error code which isn't listed in the man page isn't
particularly high in the list of discrepancies.

Again, I'm not against changing it but I'd like to see better
rationales. On one side, we have "it's been this way for a long time
and there's nothing particularly broken about it". I'm not sure the
arguments we have for the other side is strong enough yet.

Thanks.
Vasily Averin June 29, 2022, 6:13 a.m. UTC | #7
On 6/28/22 12:22, Tejun Heo wrote:
> On Tue, Jun 28, 2022 at 11:16:48AM +0200, Michal Koutný wrote:
>> The mkdir(2) manpage doesn't list EAGAIN at all. ENOSPC makes better
>> sense here. (And I suspect the dependency on this particular value won't
>> be very wide spread.)
> 
> Given how we use these system calls as triggers for random kernel
> operations, I don't think adhering to posix standard is necessary or
> possible. Using an error code which isn't listed in the man page isn't
> particularly high in the list of discrepancies.
> 
> Again, I'm not against changing it but I'd like to see better
> rationales. On one side, we have "it's been this way for a long time
> and there's nothing particularly broken about it". I'm not sure the
> arguments we have for the other side is strong enough yet.

I would like to recall this patch.

I experimented on fedora36 node with LXC and centos stream 9 container.
and I did not noticed any critical systemd troubles with original -EAGAIN.
When cgroup's limit is reached systemd cannot start new services, 
for example lxc-attach generates following output:

[root@fc34-vvs ~]# lxc-attach c9s
lxc-attach: c9s: cgroups/cgfsng.c: cgroup_attach_leaf: 2084 Resource temporarily unavailable - Failed to create leaf cgroup ".lxc"
lxc-attach: c9s: cgroups/cgfsng.c: __cgroup_attach_many: 3517 Resource temporarily unavailable - Failed to attach to cgroup fd 11
lxc-attach: c9s: attach.c: lxc_attach: 1679 Resource temporarily unavailable - Failed to attach cgroup
lxc-attach: c9s: attach.c: do_attach: 1237 No data available - Failed to receive lsm label fd
lxc-attach: c9s: attach.c: do_attach: 1375 Failed to attach to container

I did not found any loop in userspace caused by EAGAIN.
Messages looks unclear, however situation with the patched kernel is not much better:

[root@fc34-vvs ~]# lxc-attach c9s
lxc-attach: c9s: cgroups/cgfsng.c: cgroup_attach_leaf: 2084 No space left on device - Failed to create leaf cgroup ".lxc"
lxc-attach: c9s: cgroups/cgfsng.c: __cgroup_attach_many: 3517 No space left on device - Failed to attach to cgroup fd 11
lxc-attach: c9s: attach.c: lxc_attach: 1679 No space left on device - Failed to attach cgroup
lxc-attach: c9s: attach.c: do_attach: 1237 No data available - Failed to receive lsm label fd
lxc-attach: c9s: attach.c: do_attach: 1375 Failed to attach to container

Thank you,
	Vasily Averin
Tejun Heo June 29, 2022, 7:25 p.m. UTC | #8
On Wed, Jun 29, 2022 at 09:13:02AM +0300, Vasily Averin wrote:
> I experimented on fedora36 node with LXC and centos stream 9 container.
> and I did not noticed any critical systemd troubles with original -EAGAIN.
> When cgroup's limit is reached systemd cannot start new services, 
> for example lxc-attach generates following output:
> 
> [root@fc34-vvs ~]# lxc-attach c9s
> lxc-attach: c9s: cgroups/cgfsng.c: cgroup_attach_leaf: 2084 Resource temporarily unavailable - Failed to create leaf cgroup ".lxc"
> lxc-attach: c9s: cgroups/cgfsng.c: __cgroup_attach_many: 3517 Resource temporarily unavailable - Failed to attach to cgroup fd 11
> lxc-attach: c9s: attach.c: lxc_attach: 1679 Resource temporarily unavailable - Failed to attach cgroup
> lxc-attach: c9s: attach.c: do_attach: 1237 No data available - Failed to receive lsm label fd
> lxc-attach: c9s: attach.c: do_attach: 1375 Failed to attach to container
> 
> I did not found any loop in userspace caused by EAGAIN.
> Messages looks unclear, however situation with the patched kernel is not much better:
> 
> [root@fc34-vvs ~]# lxc-attach c9s
> lxc-attach: c9s: cgroups/cgfsng.c: cgroup_attach_leaf: 2084 No space left on device - Failed to create leaf cgroup ".lxc"
> lxc-attach: c9s: cgroups/cgfsng.c: __cgroup_attach_many: 3517 No space left on device - Failed to attach to cgroup fd 11
> lxc-attach: c9s: attach.c: lxc_attach: 1679 No space left on device - Failed to attach cgroup
> lxc-attach: c9s: attach.c: do_attach: 1237 No data available - Failed to receive lsm label fd
> lxc-attach: c9s: attach.c: do_attach: 1375 Failed to attach to container

I'd say "resource temporarily unavailable" is better fitting than "no
space left on device" and the syscall restart thing isn't handled by
-EAGAIN return value. Grep restart_block for that.

Thanks.
Roman Gushchin July 1, 2022, 2:42 a.m. UTC | #9
On Thu, Jun 30, 2022 at 04:25:57AM +0900, Tejun Heo wrote:
> On Wed, Jun 29, 2022 at 09:13:02AM +0300, Vasily Averin wrote:
> > I experimented on fedora36 node with LXC and centos stream 9 container.
> > and I did not noticed any critical systemd troubles with original -EAGAIN.
> > When cgroup's limit is reached systemd cannot start new services, 
> > for example lxc-attach generates following output:
> > 
> > [root@fc34-vvs ~]# lxc-attach c9s
> > lxc-attach: c9s: cgroups/cgfsng.c: cgroup_attach_leaf: 2084 Resource temporarily unavailable - Failed to create leaf cgroup ".lxc"
> > lxc-attach: c9s: cgroups/cgfsng.c: __cgroup_attach_many: 3517 Resource temporarily unavailable - Failed to attach to cgroup fd 11
> > lxc-attach: c9s: attach.c: lxc_attach: 1679 Resource temporarily unavailable - Failed to attach cgroup
> > lxc-attach: c9s: attach.c: do_attach: 1237 No data available - Failed to receive lsm label fd
> > lxc-attach: c9s: attach.c: do_attach: 1375 Failed to attach to container
> > 
> > I did not found any loop in userspace caused by EAGAIN.
> > Messages looks unclear, however situation with the patched kernel is not much better:
> > 
> > [root@fc34-vvs ~]# lxc-attach c9s
> > lxc-attach: c9s: cgroups/cgfsng.c: cgroup_attach_leaf: 2084 No space left on device - Failed to create leaf cgroup ".lxc"
> > lxc-attach: c9s: cgroups/cgfsng.c: __cgroup_attach_many: 3517 No space left on device - Failed to attach to cgroup fd 11
> > lxc-attach: c9s: attach.c: lxc_attach: 1679 No space left on device - Failed to attach cgroup
> > lxc-attach: c9s: attach.c: do_attach: 1237 No data available - Failed to receive lsm label fd
> > lxc-attach: c9s: attach.c: do_attach: 1375 Failed to attach to container
> 
> I'd say "resource temporarily unavailable" is better fitting than "no
> space left on device"

+1

Thanks!
diff mbox series

Patch

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 1be0f81fe8e1..243239553ea3 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5495,7 +5495,7 @@  int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode)
 		return -ENODEV;
 
 	if (!cgroup_check_hierarchy_limits(parent)) {
-		ret = -EAGAIN;
+		ret = -ENOSPC;
 		goto out_unlock;
 	}