Message ID | d8a9e9c6-856e-1502-95ac-abf9700ff568@openvz.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [cgroup] cgroup: set the correct return code if hierarchy limits are reached | expand |
On Mon, Jun 27, 2022 at 10:12 AM Vasily Averin <vvs@openvz.org> wrote: > > When cgroup_mkdir reaches the limits of the cgroup hierarchy, it should > not return -EAGAIN, but instead react similarly to reaching the global > limit. > > Signed-off-by: Vasily Averin <vvs@openvz.org> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Thanks.
On Mon, Jun 27, 2022 at 05:12:55AM +0300, Vasily Averin wrote: > When cgroup_mkdir reaches the limits of the cgroup hierarchy, it should > not return -EAGAIN, but instead react similarly to reaching the global > limit. While I'm not necessarily against this change, I find the rationale to be somewhat lacking. Can you please elaborate why -ENOSPC is the right one while -EAGAIN is incorrect? Thanks.
On Mon, Jun 27, 2022 at 05:12:55AM +0300, Vasily Averin wrote: > When cgroup_mkdir reaches the limits of the cgroup hierarchy, it should > not return -EAGAIN, but instead react similarly to reaching the global > limit. > > Signed-off-by: Vasily Averin <vvs@openvz.org> > --- > kernel/cgroup/cgroup.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c > index 1be0f81fe8e1..243239553ea3 100644 > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@ -5495,7 +5495,7 @@ int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode) > return -ENODEV; > > if (!cgroup_check_hierarchy_limits(parent)) { > - ret = -EAGAIN; > + ret = -ENOSPC; I'd not argue whether ENOSPC is better or worse here, but I don't think we need to change it now. It's been in this state for a long time and is a part of ABI. EAGAIN is pretty unique as a mkdir() result, so systemd can handle it well. Thanks!
On 6/28/22 03:44, Roman Gushchin wrote: > On Mon, Jun 27, 2022 at 05:12:55AM +0300, Vasily Averin wrote: >> When cgroup_mkdir reaches the limits of the cgroup hierarchy, it should >> not return -EAGAIN, but instead react similarly to reaching the global >> limit. >> >> Signed-off-by: Vasily Averin <vvs@openvz.org> >> --- >> kernel/cgroup/cgroup.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c >> index 1be0f81fe8e1..243239553ea3 100644 >> --- a/kernel/cgroup/cgroup.c >> +++ b/kernel/cgroup/cgroup.c >> @@ -5495,7 +5495,7 @@ int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode) >> return -ENODEV; >> >> if (!cgroup_check_hierarchy_limits(parent)) { >> - ret = -EAGAIN; >> + ret = -ENOSPC; > > I'd not argue whether ENOSPC is better or worse here, but I don't think we need > to change it now. It's been in this state for a long time and is a part of ABI. > EAGAIN is pretty unique as a mkdir() result, so systemd can handle it well. I would agree with you, however in my opinion EAGAIN is used to restart an interrupted system call. Thus, I worry its return can loop the user space without any chance of continuation. However, maybe I'm confusing something? Thank you, Vasily Averin
On Tue, Jun 28, 2022 at 06:59:06AM +0300, Vasily Averin <vvs@openvz.org> wrote: > I would agree with you, however in my opinion EAGAIN is used to restart an > interrupted system call. Thus, I worry its return can loop the user space without > any chance of continuation. > > However, maybe I'm confusing something? The mkdir(2) manpage doesn't list EAGAIN at all. ENOSPC makes better sense here. (And I suspect the dependency on this particular value won't be very wide spread.) 0.02€ Michal
On Tue, Jun 28, 2022 at 11:16:48AM +0200, Michal Koutný wrote: > The mkdir(2) manpage doesn't list EAGAIN at all. ENOSPC makes better > sense here. (And I suspect the dependency on this particular value won't > be very wide spread.) Given how we use these system calls as triggers for random kernel operations, I don't think adhering to posix standard is necessary or possible. Using an error code which isn't listed in the man page isn't particularly high in the list of discrepancies. Again, I'm not against changing it but I'd like to see better rationales. On one side, we have "it's been this way for a long time and there's nothing particularly broken about it". I'm not sure the arguments we have for the other side is strong enough yet. Thanks.
On 6/28/22 12:22, Tejun Heo wrote: > On Tue, Jun 28, 2022 at 11:16:48AM +0200, Michal Koutný wrote: >> The mkdir(2) manpage doesn't list EAGAIN at all. ENOSPC makes better >> sense here. (And I suspect the dependency on this particular value won't >> be very wide spread.) > > Given how we use these system calls as triggers for random kernel > operations, I don't think adhering to posix standard is necessary or > possible. Using an error code which isn't listed in the man page isn't > particularly high in the list of discrepancies. > > Again, I'm not against changing it but I'd like to see better > rationales. On one side, we have "it's been this way for a long time > and there's nothing particularly broken about it". I'm not sure the > arguments we have for the other side is strong enough yet. I would like to recall this patch. I experimented on fedora36 node with LXC and centos stream 9 container. and I did not noticed any critical systemd troubles with original -EAGAIN. When cgroup's limit is reached systemd cannot start new services, for example lxc-attach generates following output: [root@fc34-vvs ~]# lxc-attach c9s lxc-attach: c9s: cgroups/cgfsng.c: cgroup_attach_leaf: 2084 Resource temporarily unavailable - Failed to create leaf cgroup ".lxc" lxc-attach: c9s: cgroups/cgfsng.c: __cgroup_attach_many: 3517 Resource temporarily unavailable - Failed to attach to cgroup fd 11 lxc-attach: c9s: attach.c: lxc_attach: 1679 Resource temporarily unavailable - Failed to attach cgroup lxc-attach: c9s: attach.c: do_attach: 1237 No data available - Failed to receive lsm label fd lxc-attach: c9s: attach.c: do_attach: 1375 Failed to attach to container I did not found any loop in userspace caused by EAGAIN. Messages looks unclear, however situation with the patched kernel is not much better: [root@fc34-vvs ~]# lxc-attach c9s lxc-attach: c9s: cgroups/cgfsng.c: cgroup_attach_leaf: 2084 No space left on device - Failed to create leaf cgroup ".lxc" lxc-attach: c9s: cgroups/cgfsng.c: __cgroup_attach_many: 3517 No space left on device - Failed to attach to cgroup fd 11 lxc-attach: c9s: attach.c: lxc_attach: 1679 No space left on device - Failed to attach cgroup lxc-attach: c9s: attach.c: do_attach: 1237 No data available - Failed to receive lsm label fd lxc-attach: c9s: attach.c: do_attach: 1375 Failed to attach to container Thank you, Vasily Averin
On Wed, Jun 29, 2022 at 09:13:02AM +0300, Vasily Averin wrote: > I experimented on fedora36 node with LXC and centos stream 9 container. > and I did not noticed any critical systemd troubles with original -EAGAIN. > When cgroup's limit is reached systemd cannot start new services, > for example lxc-attach generates following output: > > [root@fc34-vvs ~]# lxc-attach c9s > lxc-attach: c9s: cgroups/cgfsng.c: cgroup_attach_leaf: 2084 Resource temporarily unavailable - Failed to create leaf cgroup ".lxc" > lxc-attach: c9s: cgroups/cgfsng.c: __cgroup_attach_many: 3517 Resource temporarily unavailable - Failed to attach to cgroup fd 11 > lxc-attach: c9s: attach.c: lxc_attach: 1679 Resource temporarily unavailable - Failed to attach cgroup > lxc-attach: c9s: attach.c: do_attach: 1237 No data available - Failed to receive lsm label fd > lxc-attach: c9s: attach.c: do_attach: 1375 Failed to attach to container > > I did not found any loop in userspace caused by EAGAIN. > Messages looks unclear, however situation with the patched kernel is not much better: > > [root@fc34-vvs ~]# lxc-attach c9s > lxc-attach: c9s: cgroups/cgfsng.c: cgroup_attach_leaf: 2084 No space left on device - Failed to create leaf cgroup ".lxc" > lxc-attach: c9s: cgroups/cgfsng.c: __cgroup_attach_many: 3517 No space left on device - Failed to attach to cgroup fd 11 > lxc-attach: c9s: attach.c: lxc_attach: 1679 No space left on device - Failed to attach cgroup > lxc-attach: c9s: attach.c: do_attach: 1237 No data available - Failed to receive lsm label fd > lxc-attach: c9s: attach.c: do_attach: 1375 Failed to attach to container I'd say "resource temporarily unavailable" is better fitting than "no space left on device" and the syscall restart thing isn't handled by -EAGAIN return value. Grep restart_block for that. Thanks.
On Thu, Jun 30, 2022 at 04:25:57AM +0900, Tejun Heo wrote: > On Wed, Jun 29, 2022 at 09:13:02AM +0300, Vasily Averin wrote: > > I experimented on fedora36 node with LXC and centos stream 9 container. > > and I did not noticed any critical systemd troubles with original -EAGAIN. > > When cgroup's limit is reached systemd cannot start new services, > > for example lxc-attach generates following output: > > > > [root@fc34-vvs ~]# lxc-attach c9s > > lxc-attach: c9s: cgroups/cgfsng.c: cgroup_attach_leaf: 2084 Resource temporarily unavailable - Failed to create leaf cgroup ".lxc" > > lxc-attach: c9s: cgroups/cgfsng.c: __cgroup_attach_many: 3517 Resource temporarily unavailable - Failed to attach to cgroup fd 11 > > lxc-attach: c9s: attach.c: lxc_attach: 1679 Resource temporarily unavailable - Failed to attach cgroup > > lxc-attach: c9s: attach.c: do_attach: 1237 No data available - Failed to receive lsm label fd > > lxc-attach: c9s: attach.c: do_attach: 1375 Failed to attach to container > > > > I did not found any loop in userspace caused by EAGAIN. > > Messages looks unclear, however situation with the patched kernel is not much better: > > > > [root@fc34-vvs ~]# lxc-attach c9s > > lxc-attach: c9s: cgroups/cgfsng.c: cgroup_attach_leaf: 2084 No space left on device - Failed to create leaf cgroup ".lxc" > > lxc-attach: c9s: cgroups/cgfsng.c: __cgroup_attach_many: 3517 No space left on device - Failed to attach to cgroup fd 11 > > lxc-attach: c9s: attach.c: lxc_attach: 1679 No space left on device - Failed to attach cgroup > > lxc-attach: c9s: attach.c: do_attach: 1237 No data available - Failed to receive lsm label fd > > lxc-attach: c9s: attach.c: do_attach: 1375 Failed to attach to container > > I'd say "resource temporarily unavailable" is better fitting than "no > space left on device" +1 Thanks!
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 1be0f81fe8e1..243239553ea3 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -5495,7 +5495,7 @@ int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode) return -ENODEV; if (!cgroup_check_hierarchy_limits(parent)) { - ret = -EAGAIN; + ret = -ENOSPC; goto out_unlock; }
When cgroup_mkdir reaches the limits of the cgroup hierarchy, it should not return -EAGAIN, but instead react similarly to reaching the global limit. Signed-off-by: Vasily Averin <vvs@openvz.org> --- kernel/cgroup/cgroup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)