Message ID | 20211206212650.1895-1-fw@strlen.de (mailing list archive) |
---|---|
State | Accepted, archived |
Commit | cf6bfb9af34f91c85eb70fd33d6fbb43f469d374 |
Delegated to: | Matthieu Baerts |
Headers | show |
Series | [v2,1/2] mptcp: clear 'kern' flag from fallback sockets | expand |
On Mon, 6 Dec 2021, Florian Westphal wrote: > The mptcp ULP extension relies on sk->sk_sock_kern being set correctly: > It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from > working for plain tcp sockets (any userspace-exposed socket). > > But in case of fallback, accept() can return a plain tcp sk. > In such case, sk is still tagged as 'kernel' and setsockopt will work. > > This will crash the kernel, The subflow extension has a NULL ctx->conn > mptcp socket: > > BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0 > Call Trace: > tcp_data_ready+0xf8/0x370 > [..] > > Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming connections") > Signed-off-by: Florian Westphal <fw@strlen.de> > --- > v2: also handle early-return Thanks - v2 looks good to me. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> > > net/mptcp/protocol.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > index 8319e601bc2d..4a8f2476cc75 100644 > --- a/net/mptcp/protocol.c > +++ b/net/mptcp/protocol.c > @@ -3013,7 +3013,7 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, > */ > if (WARN_ON_ONCE(!new_mptcp_sock)) { > tcp_sk(newsk)->is_mptcp = 0; > - return newsk; > + goto out; > } > > /* acquire the 2nd reference for the owning socket */ > @@ -3025,6 +3025,8 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, > MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK); > } > > +out: > + newsk->sk_kern_sock = kern; > return newsk; > } > > -- > 2.32.0 > > > -- Mat Martineau Intel
On Mon, 6 Dec 2021, Mat Martineau wrote: > On Mon, 6 Dec 2021, Florian Westphal wrote: > >> The mptcp ULP extension relies on sk->sk_sock_kern being set correctly: >> It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from >> working for plain tcp sockets (any userspace-exposed socket). >> >> But in case of fallback, accept() can return a plain tcp sk. >> In such case, sk is still tagged as 'kernel' and setsockopt will work. >> >> This will crash the kernel, The subflow extension has a NULL ctx->conn >> mptcp socket: >> >> BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0 >> Call Trace: >> tcp_data_ready+0xf8/0x370 >> [..] >> >> Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming >> connections") >> Signed-off-by: Florian Westphal <fw@strlen.de> >> --- >> v2: also handle early-return > > Thanks - v2 looks good to me. > > Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> > >> >> net/mptcp/protocol.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c >> index 8319e601bc2d..4a8f2476cc75 100644 >> --- a/net/mptcp/protocol.c >> +++ b/net/mptcp/protocol.c >> @@ -3013,7 +3013,7 @@ static struct sock *mptcp_accept(struct sock *sk, int >> flags, int *err, >> */ >> if (WARN_ON_ONCE(!new_mptcp_sock)) { >> tcp_sk(newsk)->is_mptcp = 0; >> - return newsk; >> + goto out; >> } >> >> /* acquire the 2nd reference for the owning socket */ >> @@ -3025,6 +3025,8 @@ static struct sock *mptcp_accept(struct sock *sk, int >> flags, int *err, >> MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK); >> } >> >> +out: >> + newsk->sk_kern_sock = kern; Florian - I was about to upstream this for -net, but have another question first. Is there anything else in newsk that needs to be updated when changing sk_kern_sock? sk_alloc() handles some reference counts differently for kern socks, and sock_lock_init() sets things up differently for lockdep. >> return newsk; >> } >> >> -- >> 2.32.0 >> >> >> > > -- > Mat Martineau > Intel > > -- Mat Martineau Intel
Mat Martineau <mathew.j.martineau@linux.intel.com> wrote: > On Mon, 6 Dec 2021, Mat Martineau wrote: > > > On Mon, 6 Dec 2021, Florian Westphal wrote: > > > > > The mptcp ULP extension relies on sk->sk_sock_kern being set correctly: > > > It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from > > > working for plain tcp sockets (any userspace-exposed socket). > > > > > > But in case of fallback, accept() can return a plain tcp sk. > > > In such case, sk is still tagged as 'kernel' and setsockopt will work. > > > > > > This will crash the kernel, The subflow extension has a NULL ctx->conn > > > mptcp socket: > > > > > > BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0 > > > Call Trace: > > > tcp_data_ready+0xf8/0x370 > > > [..] > > > > > > Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming > > > connections") > > > Signed-off-by: Florian Westphal <fw@strlen.de> > > > --- > > > v2: also handle early-return > > > > Thanks - v2 looks good to me. > > > > Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> > > > > > > > > net/mptcp/protocol.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > > > index 8319e601bc2d..4a8f2476cc75 100644 > > > --- a/net/mptcp/protocol.c > > > +++ b/net/mptcp/protocol.c > > > @@ -3013,7 +3013,7 @@ static struct sock *mptcp_accept(struct sock > > > *sk, int flags, int *err, > > > */ > > > if (WARN_ON_ONCE(!new_mptcp_sock)) { > > > tcp_sk(newsk)->is_mptcp = 0; > > > - return newsk; > > > + goto out; > > > } > > > > > > /* acquire the 2nd reference for the owning socket */ > > > @@ -3025,6 +3025,8 @@ static struct sock *mptcp_accept(struct sock > > > *sk, int flags, int *err, > > > MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK); > > > } > > > > > > +out: > > > + newsk->sk_kern_sock = kern; > > Florian - > > I was about to upstream this for -net, but have another question first. > > Is there anything else in newsk that needs to be updated when changing > sk_kern_sock? sk_alloc() handles some reference counts differently for kern > socks, and sock_lock_init() sets things up differently for lockdep. AFAICS no. The tcpsk inherits these settings from its parent (listen) sk, so they always have 'kern = 1'. Even before this change, lock depclass is not correct (kernel, not user). Need to export code from core to change this. The netns refcount bump is not needed, but at this point it has already happened so even if we undo+clear ->sk_net_refcnt it won't buy anthing. So only alternative I see is to toss this patch and use a different sk marker to block mptcp ulp on normal tcp sockets. This would not change the incorrect lockdep class in this case of course but would avoid messing with this. tp->is_mptcp would come to mind, we only need to set it to 1 before adding the mptcp ulp from inside the kernel rather than in the mptcp ulp init function.
On Fri, 2021-12-10 at 10:00 +0100, Florian Westphal wrote: > Mat Martineau <mathew.j.martineau@linux.intel.com> wrote: > > On Mon, 6 Dec 2021, Mat Martineau wrote: > > > > > On Mon, 6 Dec 2021, Florian Westphal wrote: > > > > > > > The mptcp ULP extension relies on sk->sk_sock_kern being set correctly: > > > > It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from > > > > working for plain tcp sockets (any userspace-exposed socket). > > > > > > > > But in case of fallback, accept() can return a plain tcp sk. > > > > In such case, sk is still tagged as 'kernel' and setsockopt will work. > > > > > > > > This will crash the kernel, The subflow extension has a NULL ctx->conn > > > > mptcp socket: > > > > > > > > BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0 > > > > Call Trace: > > > > tcp_data_ready+0xf8/0x370 > > > > [..] > > > > > > > > Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming > > > > connections") > > > > Signed-off-by: Florian Westphal <fw@strlen.de> > > > > --- > > > > v2: also handle early-return > > > > > > Thanks - v2 looks good to me. > > > > > > Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> > > > > > > > > > > > net/mptcp/protocol.c | 4 +++- > > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > > > > index 8319e601bc2d..4a8f2476cc75 100644 > > > > --- a/net/mptcp/protocol.c > > > > +++ b/net/mptcp/protocol.c > > > > @@ -3013,7 +3013,7 @@ static struct sock *mptcp_accept(struct sock > > > > *sk, int flags, int *err, > > > > */ > > > > if (WARN_ON_ONCE(!new_mptcp_sock)) { > > > > tcp_sk(newsk)->is_mptcp = 0; > > > > - return newsk; > > > > + goto out; > > > > } > > > > > > > > /* acquire the 2nd reference for the owning socket */ > > > > @@ -3025,6 +3025,8 @@ static struct sock *mptcp_accept(struct sock > > > > *sk, int flags, int *err, > > > > MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK); > > > > } > > > > > > > > +out: > > > > + newsk->sk_kern_sock = kern; > > > > Florian - > > > > I was about to upstream this for -net, but have another question first. > > > > Is there anything else in newsk that needs to be updated when changing > > sk_kern_sock? sk_alloc() handles some reference counts differently for kern > > socks, and sock_lock_init() sets things up differently for lockdep. > > AFAICS no. > > The tcpsk inherits these settings from its parent (listen) sk, so they > always have 'kern = 1'. > > Even before this change, lock depclass is not correct (kernel, not user). > > Need to export code from core to change this. I personally would go this way, with a separate patch, possibly addinig a new helper for that. Somewhat related: I don't see where the lockdep class for sk_callback_lock is set properly for any in-kernel user doing accept() on plain TCP socket (I mean: not an mptcp listener!). sk_clone_lock() calls sk_init_common() which uses unconditionally the user-space lockdep class. ?!? Cheers, Paolo
On Fri, 10 Dec 2021, Paolo Abeni wrote: > On Fri, 2021-12-10 at 10:00 +0100, Florian Westphal wrote: >> Mat Martineau <mathew.j.martineau@linux.intel.com> wrote: >>> On Mon, 6 Dec 2021, Mat Martineau wrote: >>> >>>> On Mon, 6 Dec 2021, Florian Westphal wrote: >>>> >>>>> The mptcp ULP extension relies on sk->sk_sock_kern being set correctly: >>>>> It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from >>>>> working for plain tcp sockets (any userspace-exposed socket). >>>>> >>>>> But in case of fallback, accept() can return a plain tcp sk. >>>>> In such case, sk is still tagged as 'kernel' and setsockopt will work. >>>>> >>>>> This will crash the kernel, The subflow extension has a NULL ctx->conn >>>>> mptcp socket: >>>>> >>>>> BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0 >>>>> Call Trace: >>>>> tcp_data_ready+0xf8/0x370 >>>>> [..] >>>>> >>>>> Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming >>>>> connections") >>>>> Signed-off-by: Florian Westphal <fw@strlen.de> >>>>> --- >>>>> v2: also handle early-return >>>> >>>> Thanks - v2 looks good to me. >>>> >>>> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> >>>> >>>>> >>>>> net/mptcp/protocol.c | 4 +++- >>>>> 1 file changed, 3 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c >>>>> index 8319e601bc2d..4a8f2476cc75 100644 >>>>> --- a/net/mptcp/protocol.c >>>>> +++ b/net/mptcp/protocol.c >>>>> @@ -3013,7 +3013,7 @@ static struct sock *mptcp_accept(struct sock >>>>> *sk, int flags, int *err, >>>>> */ >>>>> if (WARN_ON_ONCE(!new_mptcp_sock)) { >>>>> tcp_sk(newsk)->is_mptcp = 0; >>>>> - return newsk; >>>>> + goto out; >>>>> } >>>>> >>>>> /* acquire the 2nd reference for the owning socket */ >>>>> @@ -3025,6 +3025,8 @@ static struct sock *mptcp_accept(struct sock >>>>> *sk, int flags, int *err, >>>>> MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK); >>>>> } >>>>> >>>>> +out: >>>>> + newsk->sk_kern_sock = kern; >>> >>> Florian - >>> >>> I was about to upstream this for -net, but have another question first. >>> >>> Is there anything else in newsk that needs to be updated when changing >>> sk_kern_sock? sk_alloc() handles some reference counts differently for kern >>> socks, and sock_lock_init() sets things up differently for lockdep. >> >> AFAICS no. >> >> The tcpsk inherits these settings from its parent (listen) sk, so they >> always have 'kern = 1'. >> >> Even before this change, lock depclass is not correct (kernel, not user). >> >> Need to export code from core to change this. > > I personally would go this way, with a separate patch, possibly addinig > a new helper for that. > Are you thinking that would be cleanup for net-next? Or urgent enough for -net? I lean toward net-next, given the likely backporting of this fix. > Somewhat related: I don't see where the lockdep class for > sk_callback_lock is set properly for any in-kernel user doing accept() > on plain TCP socket (I mean: not an mptcp listener!). sk_clone_lock() > calls sk_init_common() which uses unconditionally the user-space > lockdep class. ?!? > Yeah - af_kern_callback_keys is only referenced in sock_init_data(), which always inits the lockdep class for sk_callback_lock for userspace first by calling sk_init_common(), then always calls lockdep_set_class_and_name() a second time for sk_callback_lock (setting appropriately for kern or userspace). -- Mat Martineau Intel
On Fri, 10 Dec 2021, Florian Westphal wrote: > Mat Martineau <mathew.j.martineau@linux.intel.com> wrote: >> On Mon, 6 Dec 2021, Mat Martineau wrote: >> >>> On Mon, 6 Dec 2021, Florian Westphal wrote: >>> >>>> The mptcp ULP extension relies on sk->sk_sock_kern being set correctly: >>>> It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from >>>> working for plain tcp sockets (any userspace-exposed socket). >>>> >>>> But in case of fallback, accept() can return a plain tcp sk. >>>> In such case, sk is still tagged as 'kernel' and setsockopt will work. >>>> >>>> This will crash the kernel, The subflow extension has a NULL ctx->conn >>>> mptcp socket: >>>> >>>> BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0 >>>> Call Trace: >>>> tcp_data_ready+0xf8/0x370 >>>> [..] >>>> >>>> Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming >>>> connections") >>>> Signed-off-by: Florian Westphal <fw@strlen.de> >>>> --- >>>> v2: also handle early-return >>> >>> Thanks - v2 looks good to me. >>> >>> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> >>> >>>> >>>> net/mptcp/protocol.c | 4 +++- >>>> 1 file changed, 3 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c >>>> index 8319e601bc2d..4a8f2476cc75 100644 >>>> --- a/net/mptcp/protocol.c >>>> +++ b/net/mptcp/protocol.c >>>> @@ -3013,7 +3013,7 @@ static struct sock *mptcp_accept(struct sock >>>> *sk, int flags, int *err, >>>> */ >>>> if (WARN_ON_ONCE(!new_mptcp_sock)) { >>>> tcp_sk(newsk)->is_mptcp = 0; >>>> - return newsk; >>>> + goto out; >>>> } >>>> >>>> /* acquire the 2nd reference for the owning socket */ >>>> @@ -3025,6 +3025,8 @@ static struct sock *mptcp_accept(struct sock >>>> *sk, int flags, int *err, >>>> MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK); >>>> } >>>> >>>> +out: >>>> + newsk->sk_kern_sock = kern; >> >> Florian - >> >> I was about to upstream this for -net, but have another question first. >> >> Is there anything else in newsk that needs to be updated when changing >> sk_kern_sock? sk_alloc() handles some reference counts differently for kern >> socks, and sock_lock_init() sets things up differently for lockdep. > > AFAICS no. > > The tcpsk inherits these settings from its parent (listen) sk, so they > always have 'kern = 1'. > > Even before this change, lock depclass is not correct (kernel, not user). > > Need to export code from core to change this. > > The netns refcount bump is not needed, but at this point it has already > happened so even if we undo+clear ->sk_net_refcnt it won't buy anthing. > Ok, thanks for the background on the refcounts. I also now see the code in mtpcp_subflow_create_socket() that already adjusts the refcounts. > So only alternative I see is to toss this patch and use a different > sk marker to block mptcp ulp on normal tcp sockets. > > This would not change the incorrect lockdep class in this case of course > but would avoid messing with this. > > tp->is_mptcp would come to mind, we only need to set it to 1 before > adding the mptcp ulp from inside the kernel rather than in the mptcp ulp > init function. > So the question is which inconsistency is better: mismatch between the lockdep class and sk_kern_sock bit (the original patch for this email thread), or having a sk_kern_sock=1 socket out in usespace (the proposed alternative). Neither seems ideal, but also don't appear to have serious consequences. For a -net fix now, this patch (clearing the kern bit) seems like the most straightforward for backporting. The lockdep fix could be handled independently, as it's a separate existing issue? I will plan to upstream the existing patches from the export branch on Monday if there's no objection posted here! -- Mat Martineau Intel
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 8319e601bc2d..4a8f2476cc75 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3013,7 +3013,7 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, */ if (WARN_ON_ONCE(!new_mptcp_sock)) { tcp_sk(newsk)->is_mptcp = 0; - return newsk; + goto out; } /* acquire the 2nd reference for the owning socket */ @@ -3025,6 +3025,8 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK); } +out: + newsk->sk_kern_sock = kern; return newsk; }
The mptcp ULP extension relies on sk->sk_sock_kern being set correctly: It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from working for plain tcp sockets (any userspace-exposed socket). But in case of fallback, accept() can return a plain tcp sk. In such case, sk is still tagged as 'kernel' and setsockopt will work. This will crash the kernel, The subflow extension has a NULL ctx->conn mptcp socket: BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0 Call Trace: tcp_data_ready+0xf8/0x370 [..] Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming connections") Signed-off-by: Florian Westphal <fw@strlen.de> --- v2: also handle early-return net/mptcp/protocol.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)