diff mbox series

[bpf,1/2] bpf, cgroup: Assign cgroup in cgroup_sk_alloc when called from interrupt

Message ID fe51fd2101fe9df82750d0beb2772ef77ba06bcf.1632427246.git.daniel@iogearbox.net (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series [bpf,1/2] bpf, cgroup: Assign cgroup in cgroup_sk_alloc when called from interrupt | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf
netdev/subject_prefix success Link
netdev/cc_maintainers fail 1 blamed authors not CCed: ast@kernel.org; 10 maintainers not CCed: hannes@cmpxchg.org ast@kernel.org kpsingh@kernel.org john.fastabend@gmail.com yhs@fb.com cgroups@vger.kernel.org songliubraving@fb.com netdev@vger.kernel.org lizefan.x@bytedance.com kafai@fb.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 15 this patch: 15
netdev/kdoc success Errors and warnings before: 2 this patch: 2
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 34 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 15 this patch: 15
netdev/header_inline success Link
bpf/vmtest-bpf-PR success PR summary
bpf/vmtest-bpf success VM_Test

Commit Message

Daniel Borkmann Sept. 23, 2021, 8:09 p.m. UTC
If cgroup_sk_alloc() is called from interrupt context, then just assign the
root cgroup to skcd->cgroup. Prior to commit 8520e224f547 ("bpf, cgroups:
Fix cgroup v2 fallback on v1/v2 mixed mode") we would just return, and later
on in sock_cgroup_ptr(), we were NULL-testing the cgroup in fast-path. Rather
than re-adding the NULL-test to the fast-path we can just assign it once from
cgroup_sk_alloc() given v1/v2 handling has been simplified.

Fixes: 8520e224f547 ("bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode")
Reported-by: syzbot+df709157a4ecaf192b03@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+df709157a4ecaf192b03@syzkaller.appspotmail.com
Cc: Tejun Heo <tj@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
---
 kernel/cgroup/cgroup.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

Comments

Tejun Heo Sept. 24, 2021, 5:21 p.m. UTC | #1
On Thu, Sep 23, 2021 at 10:09:23PM +0200, Daniel Borkmann wrote:
> If cgroup_sk_alloc() is called from interrupt context, then just assign the
> root cgroup to skcd->cgroup. Prior to commit 8520e224f547 ("bpf, cgroups:
> Fix cgroup v2 fallback on v1/v2 mixed mode") we would just return, and later
> on in sock_cgroup_ptr(), we were NULL-testing the cgroup in fast-path. Rather
> than re-adding the NULL-test to the fast-path we can just assign it once from
> cgroup_sk_alloc() given v1/v2 handling has been simplified.

I think you should explain why this is safe - ie. when do we hit the
condition and leak the socket to the root cgroup and why is that okay?

Thanks.
Daniel Borkmann Sept. 24, 2021, 9:20 p.m. UTC | #2
On 9/24/21 7:21 PM, Tejun Heo wrote:
> On Thu, Sep 23, 2021 at 10:09:23PM +0200, Daniel Borkmann wrote:
>> If cgroup_sk_alloc() is called from interrupt context, then just assign the
>> root cgroup to skcd->cgroup. Prior to commit 8520e224f547 ("bpf, cgroups:
>> Fix cgroup v2 fallback on v1/v2 mixed mode") we would just return, and later
>> on in sock_cgroup_ptr(), we were NULL-testing the cgroup in fast-path. Rather
>> than re-adding the NULL-test to the fast-path we can just assign it once from
>> cgroup_sk_alloc() given v1/v2 handling has been simplified.
> 
> I think you should explain why this is safe - ie. when do we hit the
> condition and leak the socket to the root cgroup and why is that okay?

I'll add it to the commit log. What I was trying to say is that before the
8520e224f547 fix, the cgroup_sk_alloc() would bail out and return early when
in_interrupt(), and in that case skcd->cgroup remained NULL. sock_cgroup_ptr()
had a NULL check for it in the old code where it says 'v ?: &cgrp_dfl_root.cgrp'
and I wonder given the !CONFIG_CGROUP_NET_PRIO && !CONFIG_CGROUP_NET_CLASSID
path doesn't have it, whether syzbot never tripped over it because it was not
testing with such config. From the repro [0], this is specific to old netrom
legacy code where RX handler nr_rx_frame() calls nr_make_new() which calls
sk_alloc() and therefore cgroup_sk_alloc() with in_interrupt() condition. Thus
the NULL skcd->cgroup, where it trips over on cgroup_sk_free() side [1]. I'm
certain you hit the same in netrom with mentioned configs off.

Thanks,
Daniel

   [0] https://syzkaller.appspot.com/x/repro.syz?x=12f2b28d300000
   [1] https://lkml.org/lkml/2021/9/17/1041
diff mbox series

Patch

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 8afa8690d288..570b0c97392a 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6574,22 +6574,29 @@  int cgroup_parse_float(const char *input, unsigned dec_shift, s64 *v)
 
 void cgroup_sk_alloc(struct sock_cgroup_data *skcd)
 {
-	/* Don't associate the sock with unrelated interrupted task's cgroup. */
-	if (in_interrupt())
-		return;
+	struct cgroup *cgroup;
 
 	rcu_read_lock();
+	/* Don't associate the sock with unrelated interrupted task's cgroup. */
+	if (in_interrupt()) {
+		cgroup = &cgrp_dfl_root.cgrp;
+		cgroup_get(cgroup);
+		goto out;
+	}
+
 	while (true) {
 		struct css_set *cset;
 
 		cset = task_css_set(current);
 		if (likely(cgroup_tryget(cset->dfl_cgrp))) {
-			skcd->cgroup = cset->dfl_cgrp;
-			cgroup_bpf_get(cset->dfl_cgrp);
+			cgroup = cset->dfl_cgrp;
 			break;
 		}
 		cpu_relax();
 	}
+out:
+	skcd->cgroup = cgroup;
+	cgroup_bpf_get(cgroup);
 	rcu_read_unlock();
 }