diff mbox series

net/smc: Buggy reordering scenario in smc socket

Message ID Zh0JLYHtd0i416XO@libra05 (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series net/smc: Buggy reordering scenario in smc socket | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Yewon Choi April 15, 2024, 11:02 a.m. UTC
Hello,
we suspect some buggy scenario due to memory reordering in concurrent execution 
of setsockopt() and sendmmsg().

(CPU 1) setsockopt():
    case TCP_FASTOPEN_NO_COOKIE:
        ...
        smc_switch_to_fallback():
            clcsock->file = sk.sk_socket->file; // (1)
            clcsock->file->private_data = clcsock; // (2)

(CPU 2) __sys_sendmmsg():
    sockfd_lookup_light():
        sock_from_file():
            sock = file->private_data; // (3)
    ...
    fput_light(sock->file, fput_needed): // (4)
        fput():
            refcount_dec_and_test(sock->file->f_count) // null-ptr-deref

There is no memory barrier between (1) and (2), so (1) might be reordered after 
(2) is written to memory. Then, execution order can be (2)->(3)->(4)->(1) 
and (4) will read uninitialized value which may cause system crash.


This kind of reordering may happen in smc_ulp_init():

(CPU 1) smc_ulp_init():
    ...
    smcsock->file = tcp->file; // (5)
	smcsock->file->private_data = smcsock; // (6)

Execution order can be (6)->(3)->(4)->(5), showing same symptom as above.


One possible solution seems to be adding release semantic in (2) and (6).


I think we don't need memory barrier between (3) and (4) because there are
critical section between (3) and (4), so lock(lock_sock/release_sock) will do this.


Could you check these? If confirmed to be a bug, we will send a patch.

Best Regards,
Yewon Choi

Comments

Yewon Choi April 15, 2024, 11:16 a.m. UTC | #1
On Mon, Apr 15, 2024 at 8:02 PM Yewon Choi <woni9911@gmail.com> wrote:
> Hello,
> we suspect some buggy scenario due to memory reordering in concurrent
> execution
> of setsockopt() and sendmmsg().
>
> (CPU 1) setsockopt():
>     case TCP_FASTOPEN_NO_COOKIE:
>         ...
>         smc_switch_to_fallback():
>             clcsock->file = sk.sk_socket->file; // (1)
>             clcsock->file->private_data = clcsock; // (2)
>
> (CPU 2) __sys_sendmmsg():
>     sockfd_lookup_light():
>         sock_from_file():
>             sock = file->private_data; // (3)
>     ...
>     fput_light(sock->file, fput_needed): // (4)
>         fput():
>             refcount_dec_and_test(sock->file->f_count) // null-ptr-deref
>
> There is no memory barrier between (1) and (2), so (1) might be reordered
> after
> (2) is written to memory. Then, execution order can be (2)->(3)->(4)->(1)
> and (4) will read uninitialized value which may cause system crash.
>
>
> This kind of reordering may happen in smc_ulp_init():
>
> (CPU 1) smc_ulp_init():
>     ...
>     smcsock->file = tcp->file; // (5)
>         smcsock->file->private_data = smcsock; // (6)
>
> Execution order can be (6)->(3)->(4)->(5), showing same symptom as above.
>
>
> One possible solution seems to be adding release semantic in (2) and (6).
>
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index 4b52b3b159c0..37c23ef3e2d5 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -921,7 +921,7 @@ static int smc_switch_to_fallback(struct smc_sock
> *smc, int reason_code)
>         trace_smc_switch_to_fallback(smc, reason_code);
>         if (smc->sk.sk_socket && smc->sk.sk_socket->file) {
>                 smc->clcsock->file = smc->sk.sk_socket->file;
> -               smc->clcsock->file->private_data = smc->clcsock;
> +               smp_store_release(&smc->clcsock->file->private_data,
> smc->clcsock);
>                 smc->clcsock->wq.fasync_list =
>                         smc->sk.sk_socket->wq.fasync_list;
>                 smc->sk.sk_socket->wq.fasync_list = NULL;
> @@ -3410,7 +3410,7 @@ static int smc_ulp_init(struct sock *sk)
>
>         /* replace tcp socket to smc */
>         smcsock->file = tcp->file;
> -       smcsock->file->private_data = smcsock;
> +       smp_store_release(&smcsock->file->private_data, smcsock);
>         smcsock->file->f_inode = SOCK_INODE(smcsock); /* replace inode
> when sock_close */
>         smcsock->file->f_path.dentry->d_inode = SOCK_INODE(smcsock); /*
> dput() in __fput */
>         tcp->file = NULL;
>
> I think we don't need memory barrier between (3) and (4) because there are
> critical section between (3) and (4), so lock(lock_sock/release_sock) will
> do this.
>
>
> Could you check these? If confirmed to be a bug, we will send a patch.
>
> Best Regards,
> Yewon Choi
>

Additionally, we found that below line (1) in smc_ulp_init() triggers 
kernel panic even when normaly executed. 

smc_ulp_init():
    ...
    tcp->file = NULL; // (1)

It can be triggered by simple system calls: 
    int sk = socket(0xa, 0x1, 0)
    setsockopt(sk, 0x6, 0x1f, "smc", sizeof("smc"))

[350998.391059] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018
[350998.391980] Mem abort info:
[350998.392288]   ESR = 0x0000000096000006
[350998.392691]   EC = 0x25: DABT (current EL), IL = 32 bits
[350998.393252]   SET = 0, FnV = 0
[350998.393586]   EA = 0, S1PTW = 0
[350998.396496]   FSC = 0x06: level 2 translation fault
[350998.399755] Data abort info:
[350998.400720]   ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
[350998.402329]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[350998.404023]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[350998.405543] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000047e44000
[350998.406735] [0000000000000018] pgd=080000004b288003, p4d=080000004b288003, pud=080000004aea9003, pmd=0000000000000000
[350998.409243] Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
[350998.409996] Modules linked in:
[350998.410404] CPU: 1 PID: 2936860 Comm: tls Not tainted 6.8.0-rc5-00163-gffd2cb6b718e-dirty #45
[350998.411462] Hardware name: linux,dummy-virt (DT)
[350998.412050] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[350998.412923] pc : fput+0x20/0x188
[350998.413349] lr : __sys_setsockopt+0xb4/0xc0
[350998.413889] sp : ffff800080443d90
[350998.414325] x29: ffff800080443d90 x28: ffff0000051cc740 x27: 0000000000000000
[350998.415218] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
[350998.416112] x23: 0000000000000004 x22: 00000000004613c8 x21: 000000000000001f
[350998.417007] x20: 0000000000000006 x19: 0000000000000000 x18: 0000000000000001
[350998.417909] x17: ffffc369333ee3cc x16: ffffc36933410ad8 x15: ffffc369335203a8
[350998.418797] x14: ffffc36933520188 x13: ffffc36932426dc0 x12: ffffc36932426cf4
[350998.419621] x11: ffffc36932426bec x10: ffffc36933522a34 x9 : 0000000fffffffe0
[350998.420447] x8 : ffffc3693351ef8c x7 : ffff00000a790578 x6 : ffff00000a790558
[350998.421273] x5 : ffff00000a790420 x4 : ffff0000051cc740 x3 : 0000000000000001
[350998.422105] x2 : 0000000000000000 x1 : 0000000000000018 x0 : ffffffffffffffff
[350998.422932] Call trace:
[350998.423231]  fput+0x20/0x188
[350998.423583]  __sys_setsockopt+0xb4/0xc0
[350998.424041]  __arm64_sys_setsockopt+0x28/0x38
[350998.424557]  invoke_syscall+0x48/0x114
[350998.425006]  el0_svc_common+0x3c/0xe8
[350998.425444]  do_el0_svc+0x20/0x2c
[350998.425844]  el0_svc+0x34/0xb8
[350998.426235]  el0t_64_sync_handler+0x13c/0x158
[350998.426749]  el0t_64_sync+0x190/0x194
[350998.427187] Code: aa0003f3 d503201f 92800000 91006261 (f8e00020) 
[350998.427893] ---[ end trace 0000000000000000 ]---
[350998.428460] Kernel panic - not syncing: Oops: Fatal exception
[350998.429126] SMP: stopping secondary CPUs
[350998.429617] Kernel Offset: 0x4368b2400000 from 0xffff800080000000
[350998.430335] PHYS_OFFSET: 0x40000000
[350998.430752] CPU features: 0x0,00000021,7002014a,2140720b
[350998.431371] Memory Limit: none
[350998.431741] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---

Could you check this, too?

Yewon Choi
Tony Lu April 16, 2024, 2:15 a.m. UTC | #2
On Mon, Apr 15, 2024 at 08:16:54PM +0900, Yewon Choi wrote:
> 
> On Mon, Apr 15, 2024 at 8:02 PM Yewon Choi <woni9911@gmail.com> wrote:
> > Hello,
> > we suspect some buggy scenario due to memory reordering in concurrent
> > execution
> > of setsockopt() and sendmmsg().
> >
> > (CPU 1) setsockopt():
> >     case TCP_FASTOPEN_NO_COOKIE:
> >         ...
> >         smc_switch_to_fallback():
> >             clcsock->file = sk.sk_socket->file; // (1)
> >             clcsock->file->private_data = clcsock; // (2)
> >
> > (CPU 2) __sys_sendmmsg():
> >     sockfd_lookup_light():
> >         sock_from_file():
> >             sock = file->private_data; // (3)
> >     ...
> >     fput_light(sock->file, fput_needed): // (4)
> >         fput():
> >             refcount_dec_and_test(sock->file->f_count) // null-ptr-deref
> >
> > There is no memory barrier between (1) and (2), so (1) might be reordered
> > after
> > (2) is written to memory. Then, execution order can be (2)->(3)->(4)->(1)
> > and (4) will read uninitialized value which may cause system crash.
> >
> >
> > This kind of reordering may happen in smc_ulp_init():
> >
> > (CPU 1) smc_ulp_init():
> >     ...
> >     smcsock->file = tcp->file; // (5)
> >         smcsock->file->private_data = smcsock; // (6)
> >
> > Execution order can be (6)->(3)->(4)->(5), showing same symptom as above.
> >
> >
> > One possible solution seems to be adding release semantic in (2) and (6).
> >
> > diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> > index 4b52b3b159c0..37c23ef3e2d5 100644
> > --- a/net/smc/af_smc.c
> > +++ b/net/smc/af_smc.c
> > @@ -921,7 +921,7 @@ static int smc_switch_to_fallback(struct smc_sock
> > *smc, int reason_code)
> >         trace_smc_switch_to_fallback(smc, reason_code);
> >         if (smc->sk.sk_socket && smc->sk.sk_socket->file) {
> >                 smc->clcsock->file = smc->sk.sk_socket->file;
> > -               smc->clcsock->file->private_data = smc->clcsock;
> > +               smp_store_release(&smc->clcsock->file->private_data,
> > smc->clcsock);
> >                 smc->clcsock->wq.fasync_list =
> >                         smc->sk.sk_socket->wq.fasync_list;
> >                 smc->sk.sk_socket->wq.fasync_list = NULL;
> > @@ -3410,7 +3410,7 @@ static int smc_ulp_init(struct sock *sk)
> >
> >         /* replace tcp socket to smc */
> >         smcsock->file = tcp->file;
> > -       smcsock->file->private_data = smcsock;
> > +       smp_store_release(&smcsock->file->private_data, smcsock);
> >         smcsock->file->f_inode = SOCK_INODE(smcsock); /* replace inode
> > when sock_close */
> >         smcsock->file->f_path.dentry->d_inode = SOCK_INODE(smcsock); /*
> > dput() in __fput */
> >         tcp->file = NULL;
> >
> > I think we don't need memory barrier between (3) and (4) because there are
> > critical section between (3) and (4), so lock(lock_sock/release_sock) will
> > do this.
> >
> >
> > Could you check these? If confirmed to be a bug, we will send a patch.
> >
> > Best Regards,
> > Yewon Choi
> >
> 
> Additionally, we found that below line (1) in smc_ulp_init() triggers 
> kernel panic even when normaly executed. 
> 
> smc_ulp_init():
>     ...
>     tcp->file = NULL; // (1)
> 
> It can be triggered by simple system calls: 
>     int sk = socket(0xa, 0x1, 0)
>     setsockopt(sk, 0x6, 0x1f, "smc", sizeof("smc"))
> 

SMC ULP isn't as widely used as we had hoped, because it has some
potential race conditions when interacting with files. Thanks for your
findings, and I will remove this ULP once its alternative solution,
eBPF with IPROTO_SMC proposal, is sent out. For now, it should be
considered as deprecated.

For the two scenarios above, I'll go over them.

Thanks,
Tony Lu
diff mbox series

Patch

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 4b52b3b159c0..37c23ef3e2d5 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -921,7 +921,7 @@  static int smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
        trace_smc_switch_to_fallback(smc, reason_code);
        if (smc->sk.sk_socket && smc->sk.sk_socket->file) {
                smc->clcsock->file = smc->sk.sk_socket->file;
-               smc->clcsock->file->private_data = smc->clcsock;
+               smp_store_release(&smc->clcsock->file->private_data, smc->clcsock);
                smc->clcsock->wq.fasync_list =
                        smc->sk.sk_socket->wq.fasync_list;
                smc->sk.sk_socket->wq.fasync_list = NULL;
@@ -3410,7 +3410,7 @@  static int smc_ulp_init(struct sock *sk)
 
        /* replace tcp socket to smc */
        smcsock->file = tcp->file;
-       smcsock->file->private_data = smcsock;
+       smp_store_release(&smcsock->file->private_data, smcsock);
        smcsock->file->f_inode = SOCK_INODE(smcsock); /* replace inode when sock_close */
        smcsock->file->f_path.dentry->d_inode = SOCK_INODE(smcsock); /* dput() in __fput */
        tcp->file = NULL;