Message ID | 20240518011346.36248-1-kuniyu@amazon.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v1,net] af_unix: Annotate data-races around sk->sk_hash. | expand |
On Sat, 18 May 2024 at 03:14, 'Kuniyuki Iwashima' via syzkaller <syzkaller@googlegroups.com> wrote: > > syzkaller reported data-race of sk->sk_hash in unix_autobind() [0], > and the same ones exist in unix_bind_bsd() and unix_bind_abstract(). > > The three bind() functions prefetch sk->sk_hash locklessly and > use it later after validating that unix_sk(sk)->addr is NULL under > unix_sk(sk)->bindlock. > > The prefetched sk->sk_hash is the hash value of unbound socket set > in unix_create1() and does not change until bind() completes. > > There could be a chance that sk->sk_hash changes after the lockless > read. However, in such a case, non-NULL unix_sk(sk)->addr is visible > under unix_sk(sk)->bindlock, and bind() returns -EINVAL without using > the prefetched value. > > The KCSAN splat is false-positive, but let's use WRITE_ONCE() and > READ_ONCE() to silence it. > > [0]: > BUG: KCSAN: data-race in unix_autobind / unix_autobind > > write to 0xffff888034a9fb88 of 4 bytes by task 4468 on cpu 0: > __unix_set_addr_hash net/unix/af_unix.c:331 [inline] > unix_autobind+0x47a/0x7d0 net/unix/af_unix.c:1185 > unix_dgram_connect+0x7e3/0x890 net/unix/af_unix.c:1373 > __sys_connect_file+0xd7/0xe0 net/socket.c:2048 > __sys_connect+0x114/0x140 net/socket.c:2065 > __do_sys_connect net/socket.c:2075 [inline] > __se_sys_connect net/socket.c:2072 [inline] > __x64_sys_connect+0x40/0x50 net/socket.c:2072 > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83 > entry_SYSCALL_64_after_hwframe+0x46/0x4e > > read to 0xffff888034a9fb88 of 4 bytes by task 4465 on cpu 1: > unix_autobind+0x28/0x7d0 net/unix/af_unix.c:1134 > unix_dgram_connect+0x7e3/0x890 net/unix/af_unix.c:1373 > __sys_connect_file+0xd7/0xe0 net/socket.c:2048 > __sys_connect+0x114/0x140 net/socket.c:2065 > __do_sys_connect net/socket.c:2075 [inline] > __se_sys_connect net/socket.c:2072 [inline] > __x64_sys_connect+0x40/0x50 net/socket.c:2072 > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83 > entry_SYSCALL_64_after_hwframe+0x46/0x4e > > value changed: 0x000000e4 -> 0x000001e3 > > Reported by Kernel Concurrency Sanitizer on: > CPU: 1 PID: 4465 Comm: syz-executor.0 Not tainted 6.8.0-12822-gcd51db110a7e #12 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 > > Fixes: afd20b9290e1 ("af_unix: Replace the big lock with small locks.") > Reported-by: syzkaller <syzkaller@googlegroups.com> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> > --- > net/unix/af_unix.c | 9 ++++----- > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > index 92a88ac070ca..e92b45e21664 100644 > --- a/net/unix/af_unix.c > +++ b/net/unix/af_unix.c > @@ -327,8 +327,7 @@ static void __unix_set_addr_hash(struct net *net, struct sock *sk, > { > __unix_remove_socket(sk); > smp_store_release(&unix_sk(sk)->addr, addr); > - > - sk->sk_hash = hash; > + WRITE_ONCE(sk->sk_hash, hash); > __unix_insert_socket(net, sk); > } > > @@ -1131,7 +1130,7 @@ static struct sock *unix_find_other(struct net *net, > > static int unix_autobind(struct sock *sk) > { > - unsigned int new_hash, old_hash = sk->sk_hash; > + unsigned int new_hash, old_hash = READ_ONCE(sk->sk_hash); > struct unix_sock *u = unix_sk(sk); > struct net *net = sock_net(sk); > struct unix_address *addr; > @@ -1195,7 +1194,7 @@ static int unix_bind_bsd(struct sock *sk, struct sockaddr_un *sunaddr, > { > umode_t mode = S_IFSOCK | > (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask()); > - unsigned int new_hash, old_hash = sk->sk_hash; > + unsigned int new_hash, old_hash = READ_ONCE(sk->sk_hash); > struct unix_sock *u = unix_sk(sk); > struct net *net = sock_net(sk); > struct mnt_idmap *idmap; > @@ -1261,7 +1260,7 @@ static int unix_bind_bsd(struct sock *sk, struct sockaddr_un *sunaddr, > static int unix_bind_abstract(struct sock *sk, struct sockaddr_un *sunaddr, > int addr_len) > { > - unsigned int new_hash, old_hash = sk->sk_hash; > + unsigned int new_hash, old_hash = READ_ONCE(sk->sk_hash); > struct unix_sock *u = unix_sk(sk); > struct net *net = sock_net(sk); > struct unix_address *addr; Hi, I don't know much about this code, but perhaps these accesses must be protected by bindlock instead? It shouldn't autobind twice, right? Perhaps the code just tried to save a line of code and moved the reads to the variable declaration section.
On Tue, 2024-05-21 at 06:16 +0200, Dmitry Vyukov wrote: > On Sat, 18 May 2024 at 03:14, 'Kuniyuki Iwashima' via syzkaller > <syzkaller@googlegroups.com> wrote: > > > > syzkaller reported data-race of sk->sk_hash in unix_autobind() [0], > > and the same ones exist in unix_bind_bsd() and unix_bind_abstract(). > > > > The three bind() functions prefetch sk->sk_hash locklessly and > > use it later after validating that unix_sk(sk)->addr is NULL under > > unix_sk(sk)->bindlock. > > > > The prefetched sk->sk_hash is the hash value of unbound socket set > > in unix_create1() and does not change until bind() completes. > > > > There could be a chance that sk->sk_hash changes after the lockless > > read. However, in such a case, non-NULL unix_sk(sk)->addr is visible > > under unix_sk(sk)->bindlock, and bind() returns -EINVAL without using > > the prefetched value. > > > > The KCSAN splat is false-positive, but let's use WRITE_ONCE() and > > READ_ONCE() to silence it. > > > > [0]: > > BUG: KCSAN: data-race in unix_autobind / unix_autobind > > > > write to 0xffff888034a9fb88 of 4 bytes by task 4468 on cpu 0: > > __unix_set_addr_hash net/unix/af_unix.c:331 [inline] > > unix_autobind+0x47a/0x7d0 net/unix/af_unix.c:1185 > > unix_dgram_connect+0x7e3/0x890 net/unix/af_unix.c:1373 > > __sys_connect_file+0xd7/0xe0 net/socket.c:2048 > > __sys_connect+0x114/0x140 net/socket.c:2065 > > __do_sys_connect net/socket.c:2075 [inline] > > __se_sys_connect net/socket.c:2072 [inline] > > __x64_sys_connect+0x40/0x50 net/socket.c:2072 > > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > > do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83 > > entry_SYSCALL_64_after_hwframe+0x46/0x4e > > > > read to 0xffff888034a9fb88 of 4 bytes by task 4465 on cpu 1: > > unix_autobind+0x28/0x7d0 net/unix/af_unix.c:1134 > > unix_dgram_connect+0x7e3/0x890 net/unix/af_unix.c:1373 > > __sys_connect_file+0xd7/0xe0 net/socket.c:2048 > > __sys_connect+0x114/0x140 net/socket.c:2065 > > __do_sys_connect net/socket.c:2075 [inline] > > __se_sys_connect net/socket.c:2072 [inline] > > __x64_sys_connect+0x40/0x50 net/socket.c:2072 > > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > > do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83 > > entry_SYSCALL_64_after_hwframe+0x46/0x4e > > > > value changed: 0x000000e4 -> 0x000001e3 > > > > Reported by Kernel Concurrency Sanitizer on: > > CPU: 1 PID: 4465 Comm: syz-executor.0 Not tainted 6.8.0-12822-gcd51db110a7e #12 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 > > > > Fixes: afd20b9290e1 ("af_unix: Replace the big lock with small locks.") > > Reported-by: syzkaller <syzkaller@googlegroups.com> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> > > --- > > net/unix/af_unix.c | 9 ++++----- > > 1 file changed, 4 insertions(+), 5 deletions(-) > > > > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > > index 92a88ac070ca..e92b45e21664 100644 > > --- a/net/unix/af_unix.c > > +++ b/net/unix/af_unix.c > > @@ -327,8 +327,7 @@ static void __unix_set_addr_hash(struct net *net, struct sock *sk, > > { > > __unix_remove_socket(sk); > > smp_store_release(&unix_sk(sk)->addr, addr); > > - > > - sk->sk_hash = hash; > > + WRITE_ONCE(sk->sk_hash, hash); > > __unix_insert_socket(net, sk); > > } > > > > @@ -1131,7 +1130,7 @@ static struct sock *unix_find_other(struct net *net, > > > > static int unix_autobind(struct sock *sk) > > { > > - unsigned int new_hash, old_hash = sk->sk_hash; > > + unsigned int new_hash, old_hash = READ_ONCE(sk->sk_hash); > > struct unix_sock *u = unix_sk(sk); > > struct net *net = sock_net(sk); > > struct unix_address *addr; > > @@ -1195,7 +1194,7 @@ static int unix_bind_bsd(struct sock *sk, struct sockaddr_un *sunaddr, > > { > > umode_t mode = S_IFSOCK | > > (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask()); > > - unsigned int new_hash, old_hash = sk->sk_hash; > > + unsigned int new_hash, old_hash = READ_ONCE(sk->sk_hash); > > struct unix_sock *u = unix_sk(sk); > > struct net *net = sock_net(sk); > > struct mnt_idmap *idmap; > > @@ -1261,7 +1260,7 @@ static int unix_bind_bsd(struct sock *sk, struct sockaddr_un *sunaddr, > > static int unix_bind_abstract(struct sock *sk, struct sockaddr_un *sunaddr, > > int addr_len) > > { > > - unsigned int new_hash, old_hash = sk->sk_hash; > > + unsigned int new_hash, old_hash = READ_ONCE(sk->sk_hash); > > struct unix_sock *u = unix_sk(sk); > > struct net *net = sock_net(sk); > > struct unix_address *addr; > > > > Hi, > > I don't know much about this code, but perhaps these accesses must be > protected by bindlock instead? > It shouldn't autobind twice, right? Perhaps the code just tried to > save a line of code and moved the reads to the variable declaration > section. I also think that sk_hash is/should be under bindlock protection, and thus moving the read should be better. Otherwise even the first sk->sk_hash in unix_insert_bsd_socket() would be 'lockless' - prior/outside to the table lock. Thanks, Paolo
From: Paolo Abeni <pabeni@redhat.com> Date: Wed, 22 May 2024 11:13:37 +0200 > On Tue, 2024-05-21 at 06:16 +0200, Dmitry Vyukov wrote: > > On Sat, 18 May 2024 at 03:14, 'Kuniyuki Iwashima' via syzkaller > > <syzkaller@googlegroups.com> wrote: > > > > > > syzkaller reported data-race of sk->sk_hash in unix_autobind() [0], > > > and the same ones exist in unix_bind_bsd() and unix_bind_abstract(). > > > > > > The three bind() functions prefetch sk->sk_hash locklessly and > > > use it later after validating that unix_sk(sk)->addr is NULL under > > > unix_sk(sk)->bindlock. > > > > > > The prefetched sk->sk_hash is the hash value of unbound socket set > > > in unix_create1() and does not change until bind() completes. > > > > > > There could be a chance that sk->sk_hash changes after the lockless > > > read. However, in such a case, non-NULL unix_sk(sk)->addr is visible > > > under unix_sk(sk)->bindlock, and bind() returns -EINVAL without using > > > the prefetched value. > > > > > > The KCSAN splat is false-positive, but let's use WRITE_ONCE() and > > > READ_ONCE() to silence it. > > > > > > [0]: > > > BUG: KCSAN: data-race in unix_autobind / unix_autobind > > > > > > write to 0xffff888034a9fb88 of 4 bytes by task 4468 on cpu 0: > > > __unix_set_addr_hash net/unix/af_unix.c:331 [inline] > > > unix_autobind+0x47a/0x7d0 net/unix/af_unix.c:1185 > > > unix_dgram_connect+0x7e3/0x890 net/unix/af_unix.c:1373 > > > __sys_connect_file+0xd7/0xe0 net/socket.c:2048 > > > __sys_connect+0x114/0x140 net/socket.c:2065 > > > __do_sys_connect net/socket.c:2075 [inline] > > > __se_sys_connect net/socket.c:2072 [inline] > > > __x64_sys_connect+0x40/0x50 net/socket.c:2072 > > > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > > > do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83 > > > entry_SYSCALL_64_after_hwframe+0x46/0x4e > > > > > > read to 0xffff888034a9fb88 of 4 bytes by task 4465 on cpu 1: > > > unix_autobind+0x28/0x7d0 net/unix/af_unix.c:1134 > > > unix_dgram_connect+0x7e3/0x890 net/unix/af_unix.c:1373 > > > __sys_connect_file+0xd7/0xe0 net/socket.c:2048 > > > __sys_connect+0x114/0x140 net/socket.c:2065 > > > __do_sys_connect net/socket.c:2075 [inline] > > > __se_sys_connect net/socket.c:2072 [inline] > > > __x64_sys_connect+0x40/0x50 net/socket.c:2072 > > > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > > > do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83 > > > entry_SYSCALL_64_after_hwframe+0x46/0x4e > > > > > > value changed: 0x000000e4 -> 0x000001e3 > > > > > > Reported by Kernel Concurrency Sanitizer on: > > > CPU: 1 PID: 4465 Comm: syz-executor.0 Not tainted 6.8.0-12822-gcd51db110a7e #12 > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 > > > > > > Fixes: afd20b9290e1 ("af_unix: Replace the big lock with small locks.") > > > Reported-by: syzkaller <syzkaller@googlegroups.com> > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> > > > --- > > > net/unix/af_unix.c | 9 ++++----- > > > 1 file changed, 4 insertions(+), 5 deletions(-) > > > > > > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > > > index 92a88ac070ca..e92b45e21664 100644 > > > --- a/net/unix/af_unix.c > > > +++ b/net/unix/af_unix.c > > > @@ -327,8 +327,7 @@ static void __unix_set_addr_hash(struct net *net, struct sock *sk, > > > { > > > __unix_remove_socket(sk); > > > smp_store_release(&unix_sk(sk)->addr, addr); > > > - > > > - sk->sk_hash = hash; > > > + WRITE_ONCE(sk->sk_hash, hash); > > > __unix_insert_socket(net, sk); > > > } > > > > > > @@ -1131,7 +1130,7 @@ static struct sock *unix_find_other(struct net *net, > > > > > > static int unix_autobind(struct sock *sk) > > > { > > > - unsigned int new_hash, old_hash = sk->sk_hash; > > > + unsigned int new_hash, old_hash = READ_ONCE(sk->sk_hash); > > > struct unix_sock *u = unix_sk(sk); > > > struct net *net = sock_net(sk); > > > struct unix_address *addr; > > > @@ -1195,7 +1194,7 @@ static int unix_bind_bsd(struct sock *sk, struct sockaddr_un *sunaddr, > > > { > > > umode_t mode = S_IFSOCK | > > > (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask()); > > > - unsigned int new_hash, old_hash = sk->sk_hash; > > > + unsigned int new_hash, old_hash = READ_ONCE(sk->sk_hash); > > > struct unix_sock *u = unix_sk(sk); > > > struct net *net = sock_net(sk); > > > struct mnt_idmap *idmap; > > > @@ -1261,7 +1260,7 @@ static int unix_bind_bsd(struct sock *sk, struct sockaddr_un *sunaddr, > > > static int unix_bind_abstract(struct sock *sk, struct sockaddr_un *sunaddr, > > > int addr_len) > > > { > > > - unsigned int new_hash, old_hash = sk->sk_hash; > > > + unsigned int new_hash, old_hash = READ_ONCE(sk->sk_hash); > > > struct unix_sock *u = unix_sk(sk); > > > struct net *net = sock_net(sk); > > > struct unix_address *addr; > > > > > > > > Hi, > > > > I don't know much about this code, but perhaps these accesses must be > > protected by bindlock instead? > > It shouldn't autobind twice, right? Perhaps the code just tried to > > save a line of code and moved the reads to the variable declaration > > section. > > I also think that sk_hash is/should be under bindlock protection, and > thus moving the read should be better. I thought ->addr check after bindlock is enough but I don't have strong preference. Will move the read after bindlock. > > Otherwise even the first sk->sk_hash in unix_insert_bsd_socket() would > be 'lockless' - prior/outside to the table lock. Once u->addr is set during bind(), it's fine to read sk_hash locklessly without READ_ONCE(). Thanks!
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 92a88ac070ca..e92b45e21664 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -327,8 +327,7 @@ static void __unix_set_addr_hash(struct net *net, struct sock *sk, { __unix_remove_socket(sk); smp_store_release(&unix_sk(sk)->addr, addr); - - sk->sk_hash = hash; + WRITE_ONCE(sk->sk_hash, hash); __unix_insert_socket(net, sk); } @@ -1131,7 +1130,7 @@ static struct sock *unix_find_other(struct net *net, static int unix_autobind(struct sock *sk) { - unsigned int new_hash, old_hash = sk->sk_hash; + unsigned int new_hash, old_hash = READ_ONCE(sk->sk_hash); struct unix_sock *u = unix_sk(sk); struct net *net = sock_net(sk); struct unix_address *addr; @@ -1195,7 +1194,7 @@ static int unix_bind_bsd(struct sock *sk, struct sockaddr_un *sunaddr, { umode_t mode = S_IFSOCK | (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask()); - unsigned int new_hash, old_hash = sk->sk_hash; + unsigned int new_hash, old_hash = READ_ONCE(sk->sk_hash); struct unix_sock *u = unix_sk(sk); struct net *net = sock_net(sk); struct mnt_idmap *idmap; @@ -1261,7 +1260,7 @@ static int unix_bind_bsd(struct sock *sk, struct sockaddr_un *sunaddr, static int unix_bind_abstract(struct sock *sk, struct sockaddr_un *sunaddr, int addr_len) { - unsigned int new_hash, old_hash = sk->sk_hash; + unsigned int new_hash, old_hash = READ_ONCE(sk->sk_hash); struct unix_sock *u = unix_sk(sk); struct net *net = sock_net(sk); struct unix_address *addr;
syzkaller reported data-race of sk->sk_hash in unix_autobind() [0], and the same ones exist in unix_bind_bsd() and unix_bind_abstract(). The three bind() functions prefetch sk->sk_hash locklessly and use it later after validating that unix_sk(sk)->addr is NULL under unix_sk(sk)->bindlock. The prefetched sk->sk_hash is the hash value of unbound socket set in unix_create1() and does not change until bind() completes. There could be a chance that sk->sk_hash changes after the lockless read. However, in such a case, non-NULL unix_sk(sk)->addr is visible under unix_sk(sk)->bindlock, and bind() returns -EINVAL without using the prefetched value. The KCSAN splat is false-positive, but let's use WRITE_ONCE() and READ_ONCE() to silence it. [0]: BUG: KCSAN: data-race in unix_autobind / unix_autobind write to 0xffff888034a9fb88 of 4 bytes by task 4468 on cpu 0: __unix_set_addr_hash net/unix/af_unix.c:331 [inline] unix_autobind+0x47a/0x7d0 net/unix/af_unix.c:1185 unix_dgram_connect+0x7e3/0x890 net/unix/af_unix.c:1373 __sys_connect_file+0xd7/0xe0 net/socket.c:2048 __sys_connect+0x114/0x140 net/socket.c:2065 __do_sys_connect net/socket.c:2075 [inline] __se_sys_connect net/socket.c:2072 [inline] __x64_sys_connect+0x40/0x50 net/socket.c:2072 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x46/0x4e read to 0xffff888034a9fb88 of 4 bytes by task 4465 on cpu 1: unix_autobind+0x28/0x7d0 net/unix/af_unix.c:1134 unix_dgram_connect+0x7e3/0x890 net/unix/af_unix.c:1373 __sys_connect_file+0xd7/0xe0 net/socket.c:2048 __sys_connect+0x114/0x140 net/socket.c:2065 __do_sys_connect net/socket.c:2075 [inline] __se_sys_connect net/socket.c:2072 [inline] __x64_sys_connect+0x40/0x50 net/socket.c:2072 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x46/0x4e value changed: 0x000000e4 -> 0x000001e3 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 4465 Comm: syz-executor.0 Not tainted 6.8.0-12822-gcd51db110a7e #12 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Fixes: afd20b9290e1 ("af_unix: Replace the big lock with small locks.") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> --- net/unix/af_unix.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)