Message ID | fe51fd2101fe9df82750d0beb2772ef77ba06bcf.1632427246.git.daniel@iogearbox.net (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | [bpf,1/2] bpf, cgroup: Assign cgroup in cgroup_sk_alloc when called from interrupt | expand |
On Thu, Sep 23, 2021 at 10:09:23PM +0200, Daniel Borkmann wrote: > If cgroup_sk_alloc() is called from interrupt context, then just assign the > root cgroup to skcd->cgroup. Prior to commit 8520e224f547 ("bpf, cgroups: > Fix cgroup v2 fallback on v1/v2 mixed mode") we would just return, and later > on in sock_cgroup_ptr(), we were NULL-testing the cgroup in fast-path. Rather > than re-adding the NULL-test to the fast-path we can just assign it once from > cgroup_sk_alloc() given v1/v2 handling has been simplified. I think you should explain why this is safe - ie. when do we hit the condition and leak the socket to the root cgroup and why is that okay? Thanks.
On 9/24/21 7:21 PM, Tejun Heo wrote: > On Thu, Sep 23, 2021 at 10:09:23PM +0200, Daniel Borkmann wrote: >> If cgroup_sk_alloc() is called from interrupt context, then just assign the >> root cgroup to skcd->cgroup. Prior to commit 8520e224f547 ("bpf, cgroups: >> Fix cgroup v2 fallback on v1/v2 mixed mode") we would just return, and later >> on in sock_cgroup_ptr(), we were NULL-testing the cgroup in fast-path. Rather >> than re-adding the NULL-test to the fast-path we can just assign it once from >> cgroup_sk_alloc() given v1/v2 handling has been simplified. > > I think you should explain why this is safe - ie. when do we hit the > condition and leak the socket to the root cgroup and why is that okay? I'll add it to the commit log. What I was trying to say is that before the 8520e224f547 fix, the cgroup_sk_alloc() would bail out and return early when in_interrupt(), and in that case skcd->cgroup remained NULL. sock_cgroup_ptr() had a NULL check for it in the old code where it says 'v ?: &cgrp_dfl_root.cgrp' and I wonder given the !CONFIG_CGROUP_NET_PRIO && !CONFIG_CGROUP_NET_CLASSID path doesn't have it, whether syzbot never tripped over it because it was not testing with such config. From the repro [0], this is specific to old netrom legacy code where RX handler nr_rx_frame() calls nr_make_new() which calls sk_alloc() and therefore cgroup_sk_alloc() with in_interrupt() condition. Thus the NULL skcd->cgroup, where it trips over on cgroup_sk_free() side [1]. I'm certain you hit the same in netrom with mentioned configs off. Thanks, Daniel [0] https://syzkaller.appspot.com/x/repro.syz?x=12f2b28d300000 [1] https://lkml.org/lkml/2021/9/17/1041
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 8afa8690d288..570b0c97392a 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -6574,22 +6574,29 @@ int cgroup_parse_float(const char *input, unsigned dec_shift, s64 *v) void cgroup_sk_alloc(struct sock_cgroup_data *skcd) { - /* Don't associate the sock with unrelated interrupted task's cgroup. */ - if (in_interrupt()) - return; + struct cgroup *cgroup; rcu_read_lock(); + /* Don't associate the sock with unrelated interrupted task's cgroup. */ + if (in_interrupt()) { + cgroup = &cgrp_dfl_root.cgrp; + cgroup_get(cgroup); + goto out; + } + while (true) { struct css_set *cset; cset = task_css_set(current); if (likely(cgroup_tryget(cset->dfl_cgrp))) { - skcd->cgroup = cset->dfl_cgrp; - cgroup_bpf_get(cset->dfl_cgrp); + cgroup = cset->dfl_cgrp; break; } cpu_relax(); } +out: + skcd->cgroup = cgroup; + cgroup_bpf_get(cgroup); rcu_read_unlock(); }