diff mbox series

[v2,1/2] mptcp: clear 'kern' flag from fallback sockets

Message ID 20211206212650.1895-1-fw@strlen.de (mailing list archive)
State Accepted, archived
Commit cf6bfb9af34f91c85eb70fd33d6fbb43f469d374
Delegated to: Matthieu Baerts
Headers show
Series [v2,1/2] mptcp: clear 'kern' flag from fallback sockets | expand

Commit Message

Florian Westphal Dec. 6, 2021, 9:26 p.m. UTC
The mptcp ULP extension relies on sk->sk_sock_kern being set correctly:
It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from
working for plain tcp sockets (any userspace-exposed socket).

But in case of fallback, accept() can return a plain tcp sk.
In such case, sk is still tagged as 'kernel' and setsockopt will work.

This will crash the kernel, The subflow extension has a NULL ctx->conn
mptcp socket:

BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0
Call Trace:
 tcp_data_ready+0xf8/0x370
 [..]

Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming connections")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 v2: also handle early-return

 net/mptcp/protocol.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Mat Martineau Dec. 6, 2021, 9:49 p.m. UTC | #1
On Mon, 6 Dec 2021, Florian Westphal wrote:

> The mptcp ULP extension relies on sk->sk_sock_kern being set correctly:
> It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from
> working for plain tcp sockets (any userspace-exposed socket).
>
> But in case of fallback, accept() can return a plain tcp sk.
> In such case, sk is still tagged as 'kernel' and setsockopt will work.
>
> This will crash the kernel, The subflow extension has a NULL ctx->conn
> mptcp socket:
>
> BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0
> Call Trace:
> tcp_data_ready+0xf8/0x370
> [..]
>
> Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming connections")
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
> v2: also handle early-return

Thanks - v2 looks good to me.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

>
> net/mptcp/protocol.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 8319e601bc2d..4a8f2476cc75 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -3013,7 +3013,7 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
> 		 */
> 		if (WARN_ON_ONCE(!new_mptcp_sock)) {
> 			tcp_sk(newsk)->is_mptcp = 0;
> -			return newsk;
> +			goto out;
> 		}
>
> 		/* acquire the 2nd reference for the owning socket */
> @@ -3025,6 +3025,8 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
> 				MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK);
> 	}
>
> +out:
> +	newsk->sk_kern_sock = kern;
> 	return newsk;
> }
>
> -- 
> 2.32.0
>
>
>

--
Mat Martineau
Intel
Mat Martineau Dec. 10, 2021, 1:38 a.m. UTC | #2
On Mon, 6 Dec 2021, Mat Martineau wrote:

> On Mon, 6 Dec 2021, Florian Westphal wrote:
>
>> The mptcp ULP extension relies on sk->sk_sock_kern being set correctly:
>> It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from
>> working for plain tcp sockets (any userspace-exposed socket).
>> 
>> But in case of fallback, accept() can return a plain tcp sk.
>> In such case, sk is still tagged as 'kernel' and setsockopt will work.
>> 
>> This will crash the kernel, The subflow extension has a NULL ctx->conn
>> mptcp socket:
>> 
>> BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0
>> Call Trace:
>> tcp_data_ready+0xf8/0x370
>> [..]
>> 
>> Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming 
>> connections")
>> Signed-off-by: Florian Westphal <fw@strlen.de>
>> ---
>> v2: also handle early-return
>
> Thanks - v2 looks good to me.
>
> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
>
>> 
>> net/mptcp/protocol.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>> 
>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>> index 8319e601bc2d..4a8f2476cc75 100644
>> --- a/net/mptcp/protocol.c
>> +++ b/net/mptcp/protocol.c
>> @@ -3013,7 +3013,7 @@ static struct sock *mptcp_accept(struct sock *sk, int 
>> flags, int *err,
>> 		 */
>> 		if (WARN_ON_ONCE(!new_mptcp_sock)) {
>> 			tcp_sk(newsk)->is_mptcp = 0;
>> -			return newsk;
>> +			goto out;
>> 		}
>>
>> 		/* acquire the 2nd reference for the owning socket */
>> @@ -3025,6 +3025,8 @@ static struct sock *mptcp_accept(struct sock *sk, int 
>> flags, int *err,
>> 				MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK);
>> 	}
>> 
>> +out:
>> +	newsk->sk_kern_sock = kern;

Florian -

I was about to upstream this for -net, but have another question first.

Is there anything else in newsk that needs to be updated when changing 
sk_kern_sock? sk_alloc() handles some reference counts differently for 
kern socks, and sock_lock_init() sets things up differently for lockdep.


>> 	return newsk;
>> }
>> 
>> -- 
>> 2.32.0
>> 
>> 
>> 
>
> --
> Mat Martineau
> Intel
>
>

--
Mat Martineau
Intel
Florian Westphal Dec. 10, 2021, 9 a.m. UTC | #3
Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
> On Mon, 6 Dec 2021, Mat Martineau wrote:
> 
> > On Mon, 6 Dec 2021, Florian Westphal wrote:
> > 
> > > The mptcp ULP extension relies on sk->sk_sock_kern being set correctly:
> > > It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from
> > > working for plain tcp sockets (any userspace-exposed socket).
> > > 
> > > But in case of fallback, accept() can return a plain tcp sk.
> > > In such case, sk is still tagged as 'kernel' and setsockopt will work.
> > > 
> > > This will crash the kernel, The subflow extension has a NULL ctx->conn
> > > mptcp socket:
> > > 
> > > BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0
> > > Call Trace:
> > > tcp_data_ready+0xf8/0x370
> > > [..]
> > > 
> > > Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming
> > > connections")
> > > Signed-off-by: Florian Westphal <fw@strlen.de>
> > > ---
> > > v2: also handle early-return
> > 
> > Thanks - v2 looks good to me.
> > 
> > Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
> > 
> > > 
> > > net/mptcp/protocol.c | 4 +++-
> > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > > index 8319e601bc2d..4a8f2476cc75 100644
> > > --- a/net/mptcp/protocol.c
> > > +++ b/net/mptcp/protocol.c
> > > @@ -3013,7 +3013,7 @@ static struct sock *mptcp_accept(struct sock
> > > *sk, int flags, int *err,
> > > 		 */
> > > 		if (WARN_ON_ONCE(!new_mptcp_sock)) {
> > > 			tcp_sk(newsk)->is_mptcp = 0;
> > > -			return newsk;
> > > +			goto out;
> > > 		}
> > > 
> > > 		/* acquire the 2nd reference for the owning socket */
> > > @@ -3025,6 +3025,8 @@ static struct sock *mptcp_accept(struct sock
> > > *sk, int flags, int *err,
> > > 				MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK);
> > > 	}
> > > 
> > > +out:
> > > +	newsk->sk_kern_sock = kern;
> 
> Florian -
> 
> I was about to upstream this for -net, but have another question first.
> 
> Is there anything else in newsk that needs to be updated when changing
> sk_kern_sock? sk_alloc() handles some reference counts differently for kern
> socks, and sock_lock_init() sets things up differently for lockdep.

AFAICS no.

The tcpsk inherits these settings from its parent (listen) sk, so they
always have 'kern = 1'.

Even before this change, lock depclass is not correct (kernel, not user).

Need to export code from core to change this.

The netns refcount bump is not needed, but at this point it has already
happened so even if we undo+clear ->sk_net_refcnt it won't buy anthing.

So only alternative I see is to toss this patch and use a different
sk marker to block mptcp ulp on normal tcp sockets.

This would not change the incorrect lockdep class in this case of course
but would avoid messing with this.

tp->is_mptcp would come to mind, we only need to set it to 1 before
adding the mptcp ulp from inside the kernel rather than in the mptcp ulp
init function.
Paolo Abeni Dec. 10, 2021, 10:46 a.m. UTC | #4
On Fri, 2021-12-10 at 10:00 +0100, Florian Westphal wrote:
> Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
> > On Mon, 6 Dec 2021, Mat Martineau wrote:
> > 
> > > On Mon, 6 Dec 2021, Florian Westphal wrote:
> > > 
> > > > The mptcp ULP extension relies on sk->sk_sock_kern being set correctly:
> > > > It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from
> > > > working for plain tcp sockets (any userspace-exposed socket).
> > > > 
> > > > But in case of fallback, accept() can return a plain tcp sk.
> > > > In such case, sk is still tagged as 'kernel' and setsockopt will work.
> > > > 
> > > > This will crash the kernel, The subflow extension has a NULL ctx->conn
> > > > mptcp socket:
> > > > 
> > > > BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0
> > > > Call Trace:
> > > > tcp_data_ready+0xf8/0x370
> > > > [..]
> > > > 
> > > > Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming
> > > > connections")
> > > > Signed-off-by: Florian Westphal <fw@strlen.de>
> > > > ---
> > > > v2: also handle early-return
> > > 
> > > Thanks - v2 looks good to me.
> > > 
> > > Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
> > > 
> > > > 
> > > > net/mptcp/protocol.c | 4 +++-
> > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > > > index 8319e601bc2d..4a8f2476cc75 100644
> > > > --- a/net/mptcp/protocol.c
> > > > +++ b/net/mptcp/protocol.c
> > > > @@ -3013,7 +3013,7 @@ static struct sock *mptcp_accept(struct sock
> > > > *sk, int flags, int *err,
> > > > 		 */
> > > > 		if (WARN_ON_ONCE(!new_mptcp_sock)) {
> > > > 			tcp_sk(newsk)->is_mptcp = 0;
> > > > -			return newsk;
> > > > +			goto out;
> > > > 		}
> > > > 
> > > > 		/* acquire the 2nd reference for the owning socket */
> > > > @@ -3025,6 +3025,8 @@ static struct sock *mptcp_accept(struct sock
> > > > *sk, int flags, int *err,
> > > > 				MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK);
> > > > 	}
> > > > 
> > > > +out:
> > > > +	newsk->sk_kern_sock = kern;
> > 
> > Florian -
> > 
> > I was about to upstream this for -net, but have another question first.
> > 
> > Is there anything else in newsk that needs to be updated when changing
> > sk_kern_sock? sk_alloc() handles some reference counts differently for kern
> > socks, and sock_lock_init() sets things up differently for lockdep.
> 
> AFAICS no.
> 
> The tcpsk inherits these settings from its parent (listen) sk, so they
> always have 'kern = 1'.
> 
> Even before this change, lock depclass is not correct (kernel, not user).
> 
> Need to export code from core to change this.

I personally would go this way, with a separate patch, possibly addinig
a new helper for that.

Somewhat related: I don't see where the lockdep class for
sk_callback_lock is set properly for any in-kernel user doing accept()
on plain TCP socket (I mean: not an mptcp listener!). sk_clone_lock()
calls sk_init_common() which uses unconditionally the user-space
lockdep class. ?!?

Cheers,

Paolo
Mat Martineau Dec. 10, 2021, 8:48 p.m. UTC | #5
On Fri, 10 Dec 2021, Paolo Abeni wrote:

> On Fri, 2021-12-10 at 10:00 +0100, Florian Westphal wrote:
>> Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
>>> On Mon, 6 Dec 2021, Mat Martineau wrote:
>>>
>>>> On Mon, 6 Dec 2021, Florian Westphal wrote:
>>>>
>>>>> The mptcp ULP extension relies on sk->sk_sock_kern being set correctly:
>>>>> It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from
>>>>> working for plain tcp sockets (any userspace-exposed socket).
>>>>>
>>>>> But in case of fallback, accept() can return a plain tcp sk.
>>>>> In such case, sk is still tagged as 'kernel' and setsockopt will work.
>>>>>
>>>>> This will crash the kernel, The subflow extension has a NULL ctx->conn
>>>>> mptcp socket:
>>>>>
>>>>> BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0
>>>>> Call Trace:
>>>>> tcp_data_ready+0xf8/0x370
>>>>> [..]
>>>>>
>>>>> Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming
>>>>> connections")
>>>>> Signed-off-by: Florian Westphal <fw@strlen.de>
>>>>> ---
>>>>> v2: also handle early-return
>>>>
>>>> Thanks - v2 looks good to me.
>>>>
>>>> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
>>>>
>>>>>
>>>>> net/mptcp/protocol.c | 4 +++-
>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>>>> index 8319e601bc2d..4a8f2476cc75 100644
>>>>> --- a/net/mptcp/protocol.c
>>>>> +++ b/net/mptcp/protocol.c
>>>>> @@ -3013,7 +3013,7 @@ static struct sock *mptcp_accept(struct sock
>>>>> *sk, int flags, int *err,
>>>>> 		 */
>>>>> 		if (WARN_ON_ONCE(!new_mptcp_sock)) {
>>>>> 			tcp_sk(newsk)->is_mptcp = 0;
>>>>> -			return newsk;
>>>>> +			goto out;
>>>>> 		}
>>>>>
>>>>> 		/* acquire the 2nd reference for the owning socket */
>>>>> @@ -3025,6 +3025,8 @@ static struct sock *mptcp_accept(struct sock
>>>>> *sk, int flags, int *err,
>>>>> 				MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK);
>>>>> 	}
>>>>>
>>>>> +out:
>>>>> +	newsk->sk_kern_sock = kern;
>>>
>>> Florian -
>>>
>>> I was about to upstream this for -net, but have another question first.
>>>
>>> Is there anything else in newsk that needs to be updated when changing
>>> sk_kern_sock? sk_alloc() handles some reference counts differently for kern
>>> socks, and sock_lock_init() sets things up differently for lockdep.
>>
>> AFAICS no.
>>
>> The tcpsk inherits these settings from its parent (listen) sk, so they
>> always have 'kern = 1'.
>>
>> Even before this change, lock depclass is not correct (kernel, not user).
>>
>> Need to export code from core to change this.
>
> I personally would go this way, with a separate patch, possibly addinig
> a new helper for that.
>

Are you thinking that would be cleanup for net-next? Or urgent enough for 
-net?

I lean toward net-next, given the likely backporting of this fix.

> Somewhat related: I don't see where the lockdep class for
> sk_callback_lock is set properly for any in-kernel user doing accept()
> on plain TCP socket (I mean: not an mptcp listener!). sk_clone_lock()
> calls sk_init_common() which uses unconditionally the user-space
> lockdep class. ?!?
>

Yeah - af_kern_callback_keys is only referenced in sock_init_data(), which 
always inits the lockdep class for sk_callback_lock for userspace first by 
calling sk_init_common(), then always calls lockdep_set_class_and_name() a 
second time for sk_callback_lock (setting appropriately for kern or 
userspace).

--
Mat Martineau
Intel
Mat Martineau Dec. 10, 2021, 11:04 p.m. UTC | #6
On Fri, 10 Dec 2021, Florian Westphal wrote:

> Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
>> On Mon, 6 Dec 2021, Mat Martineau wrote:
>>
>>> On Mon, 6 Dec 2021, Florian Westphal wrote:
>>>
>>>> The mptcp ULP extension relies on sk->sk_sock_kern being set correctly:
>>>> It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from
>>>> working for plain tcp sockets (any userspace-exposed socket).
>>>>
>>>> But in case of fallback, accept() can return a plain tcp sk.
>>>> In such case, sk is still tagged as 'kernel' and setsockopt will work.
>>>>
>>>> This will crash the kernel, The subflow extension has a NULL ctx->conn
>>>> mptcp socket:
>>>>
>>>> BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0
>>>> Call Trace:
>>>> tcp_data_ready+0xf8/0x370
>>>> [..]
>>>>
>>>> Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming
>>>> connections")
>>>> Signed-off-by: Florian Westphal <fw@strlen.de>
>>>> ---
>>>> v2: also handle early-return
>>>
>>> Thanks - v2 looks good to me.
>>>
>>> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
>>>
>>>>
>>>> net/mptcp/protocol.c | 4 +++-
>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>>> index 8319e601bc2d..4a8f2476cc75 100644
>>>> --- a/net/mptcp/protocol.c
>>>> +++ b/net/mptcp/protocol.c
>>>> @@ -3013,7 +3013,7 @@ static struct sock *mptcp_accept(struct sock
>>>> *sk, int flags, int *err,
>>>> 		 */
>>>> 		if (WARN_ON_ONCE(!new_mptcp_sock)) {
>>>> 			tcp_sk(newsk)->is_mptcp = 0;
>>>> -			return newsk;
>>>> +			goto out;
>>>> 		}
>>>>
>>>> 		/* acquire the 2nd reference for the owning socket */
>>>> @@ -3025,6 +3025,8 @@ static struct sock *mptcp_accept(struct sock
>>>> *sk, int flags, int *err,
>>>> 				MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK);
>>>> 	}
>>>>
>>>> +out:
>>>> +	newsk->sk_kern_sock = kern;
>>
>> Florian -
>>
>> I was about to upstream this for -net, but have another question first.
>>
>> Is there anything else in newsk that needs to be updated when changing
>> sk_kern_sock? sk_alloc() handles some reference counts differently for kern
>> socks, and sock_lock_init() sets things up differently for lockdep.
>
> AFAICS no.
>
> The tcpsk inherits these settings from its parent (listen) sk, so they
> always have 'kern = 1'.
>
> Even before this change, lock depclass is not correct (kernel, not user).
>
> Need to export code from core to change this.
>
> The netns refcount bump is not needed, but at this point it has already
> happened so even if we undo+clear ->sk_net_refcnt it won't buy anthing.
>

Ok, thanks for the background on the refcounts. I also now see the code in 
mtpcp_subflow_create_socket() that already adjusts the refcounts.

> So only alternative I see is to toss this patch and use a different
> sk marker to block mptcp ulp on normal tcp sockets.
>
> This would not change the incorrect lockdep class in this case of course
> but would avoid messing with this.
>
> tp->is_mptcp would come to mind, we only need to set it to 1 before
> adding the mptcp ulp from inside the kernel rather than in the mptcp ulp
> init function.
>

So the question is which inconsistency is better: mismatch between the 
lockdep class and sk_kern_sock bit (the original patch for this email 
thread), or having a sk_kern_sock=1 socket out in usespace (the proposed 
alternative).

Neither seems ideal, but also don't appear to have serious consequences. 
For a -net fix now, this patch (clearing the kern bit) seems like the most 
straightforward for backporting. The lockdep fix could be handled 
independently, as it's a separate existing issue?


I will plan to upstream the existing patches from the export branch on 
Monday if there's no objection posted here!


--
Mat Martineau
Intel
diff mbox series

Patch

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 8319e601bc2d..4a8f2476cc75 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3013,7 +3013,7 @@  static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
 		 */
 		if (WARN_ON_ONCE(!new_mptcp_sock)) {
 			tcp_sk(newsk)->is_mptcp = 0;
-			return newsk;
+			goto out;
 		}
 
 		/* acquire the 2nd reference for the owning socket */
@@ -3025,6 +3025,8 @@  static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
 				MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK);
 	}
 
+out:
+	newsk->sk_kern_sock = kern;
 	return newsk;
 }