diff mbox series

[v1,net-next,4/6] af_unix: Acquire/Release per-netns hash table's locks.

Message ID 20220616234714.4291-5-kuniyu@amazon.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series af_unix: Introduce per-netns socket hash table. | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 9 this patch: 9
netdev/cc_maintainers warning 1 maintainers not CCed: viro@zeniv.linux.org.uk
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 9 this patch: 9
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 335 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Kuniyuki Iwashima June 16, 2022, 11:47 p.m. UTC
This commit adds extra spin_lock/spin_unlock() for a per-netns
hash table inside the existing ones for unix_table_locks.

As of this commit, sockets are still linked in the global hash
table.  After putting sockets in a per-netns hash table in the
next patch, we remove the global hash table in the last patch
of this series.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/af_unix.c | 75 +++++++++++++++++++++++++++++++---------------
 net/unix/diag.c    | 23 +++++++++-----
 2 files changed, 66 insertions(+), 32 deletions(-)

Comments

kernel test robot June 20, 2022, 6:10 a.m. UTC | #1
Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: b4813d591454d771b5aaf33a6252b214648c430f ("[PATCH v1 net-next 4/6] af_unix: Acquire/Release per-netns hash table's locks.")
url: https://github.com/intel-lab-lkp/linux/commits/Kuniyuki-Iwashima/af_unix-Introduce-per-netns-socket-hash-table/20220617-075046
base: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git 5dcb50c009c9f8ec1cfca6a81a05c0060a5bbf68
patch link: https://lore.kernel.org/netdev/20220616234714.4291-5-kuniyu@amazon.com

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@intel.com>


[  113.085258][    T1] WARNING: possible recursive locking detected
[  113.085261][    T1] 5.19.0-rc1-00408-gb4813d591454 #1 Not tainted
[  113.085264][    T1] --------------------------------------------
[  113.085265][    T1] systemd/1 is trying to acquire lock:
[ 113.085270][ T1] ffff888167ee6c18 (&net->unx.hash[i].lock){+.+.}-{2:2}, at: unix_bind_bsd (net/unix/af_unix.c:1200) 
[  113.085313][    T1]
[  113.085313][    T1] but task is already holding lock:
[ 113.085314][ T1] ffff888167ee0918 (&net->unx.hash[i].lock){+.+.}-{2:2}, at: unix_bind_bsd (net/unix/af_unix.c:175 net/unix/af_unix.c:1199) 
[  113.085321][    T1]
[  113.085321][    T1] other info that might help us debug this:
[  113.085323][    T1]  Possible unsafe locking scenario:
[  113.085323][    T1]
[  113.085324][    T1]        CPU0
[  113.085325][    T1]        ----
[  113.085325][    T1]   lock(&net->unx.hash[i].lock);
[  113.085328][    T1]   lock(&net->unx.hash[i].lock);
[  113.085330][    T1]
[  113.085330][    T1]  *** DEADLOCK ***
[  113.085330][    T1]
[  113.085331][    T1]  May be due to missing lock nesting notation
[  113.085331][    T1]
[  113.085333][    T1] 6 locks held by systemd/1:
[ 113.085335][ T1] #0: ffff88815da40448 (sb_writers#6){.+.+}-{0:0}, at: filename_create (fs/namei.c:3744) 
[ 113.085351][ T1] #1: ffff88815bffec40 (&type->i_mutex_dir_key#4/1){+.+.}-{3:3}, at: filename_create (fs/namei.c:3747) 
[  OK  ] Started Forward Password Requests to Wall Directory Watch.
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Reached target Paths.
[  OK  ] Listening on udev Control Socket.
[ 113.085359][ T1] #2: ffff88815d974e18 (&u->bindlock){+.+.}-{3:3}, at: unix_bind_bsd (net/unix/af_unix.c:1192) 
[ 113.085370][ T1] #3: ffffffffb0eec038 (&unix_table_locks[i]){+.+.}-{2:2}, at: unix_bind_bsd (net/unix/af_unix.c:172 net/unix/af_unix.c:1199) 
[ 113.085377][ T1] #4: ffffffffb0ef1838 (&unix_table_locks[i]/1){+.+.}-{2:2}, at: unix_bind_bsd (net/unix/af_unix.c:174 net/unix/af_unix.c:1199) 
[ 113.085384][ T1] #5: ffff888167ee0918 (&net->unx.hash[i].lock){+.+.}-{2:2}, at: unix_bind_bsd (net/unix/af_unix.c:175 net/unix/af_unix.c:1199) 
[  113.085391][    T1]
[  113.085391][    T1] stack backtrace:
[  113.085395][    T1] CPU: 1 PID: 1 Comm: systemd Not tainted 5.19.0-rc1-00408-gb4813d591454 #1
[  113.085401][    T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[  113.085408][    T1] Call Trace:
[  113.085419][    T1]  <TASK>
[ 113.085421][ T1] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
[ 113.085453][ T1] validate_chain.cold (kernel/locking/lockdep.c:2988 kernel/locking/lockdep.c:3031 kernel/locking/lockdep.c:3816) 
[ 113.085473][ T1] ? check_prev_add (kernel/locking/lockdep.c:3785) 
[ 113.085483][ T1] ? rcu_read_unlock (include/linux/rcupdate.h:724 (discriminator 5)) 
[ 113.085489][ T1] __lock_acquire (kernel/locking/lockdep.c:5053) 
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket.
[  OK  ] Reached target Encrypted Volumes.
[  OK  ] Listening on /dev/initctl Compatibility Named Pipe.
[ 113.085497][ T1] ? rcu_read_unlock (include/linux/rcupdate.h:724 (discriminator 5)) 
[ 113.085501][ T1] lock_acquire (kernel/locking/lockdep.c:466 kernel/locking/lockdep.c:5667 kernel/locking/lockdep.c:5630) 
[ 113.085504][ T1] ? unix_bind_bsd (net/unix/af_unix.c:1200) 
[ 113.085509][ T1] ? rcu_read_unlock (include/linux/rcupdate.h:724 (discriminator 5)) 
[ 113.085513][ T1] ? do_raw_spin_lock (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:111 kernel/locking/spinlock_debug.c:115) 
[ 113.085519][ T1] ? rwlock_bug+0xc0/0xc0 
[  OK  ] Created slice User and Session Slice.
[ 113.085524][ T1] _raw_spin_lock (include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154) 
[ 113.085539][ T1] ? unix_bind_bsd (net/unix/af_unix.c:1200) 
[ 113.085543][ T1] unix_bind_bsd (net/unix/af_unix.c:1200) 
[ 113.085548][ T1] ? __might_fault (mm/memory.c:5566 mm/memory.c:5559) 
[ 113.085557][ T1] ? unix_stream_sendmsg (net/unix/af_unix.c:1153) 
[  OK  ] Created slice System Slice.
[ 113.085560][ T1] ? lock_release (kernel/locking/lockdep.c:466 kernel/locking/lockdep.c:5687) 
[ 113.085563][ T1] ? _copy_from_user (arch/x86/include/asm/uaccess_64.h:46 arch/x86/include/asm/uaccess_64.h:52 lib/usercopy.c:16) 
[ 113.085580][ T1] __sys_bind (net/socket.c:1776) 
[ 113.085589][ T1] ? __ia32_sys_socketpair (net/socket.c:1763) 
[ 113.085592][ T1] ? __lock_release (kernel/locking/lockdep.c:5341) 
[ 113.085597][ T1] ? lock_is_held_type (kernel/locking/lockdep.c:5406 kernel/locking/lockdep.c:5708) 
[ 113.085606][ T1] ? __might_fault (mm/memory.c:5566 mm/memory.c:5559) 
[ 113.085610][ T1] ? lock_release (kernel/locking/lockdep.c:466 kernel/locking/lockdep.c:5687) 
[ 113.085614][ T1] __do_compat_sys_socketcall (net/compat.c:453) 
[ 113.085627][ T1] ? __x64_sys_rmdir (fs/namei.c:4221) 
[ 113.085631][ T1] ? __ia32_compat_sys_recvmmsg_time32 (net/compat.c:425) 
[ 113.085637][ T1] ? syscall_exit_to_user_mode (kernel/entry/common.c:129 kernel/entry/common.c:296) 
[ 113.085642][ T1] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526) 
[ 113.085646][ T1] __do_fast_syscall_32 (arch/x86/entry/common.c:112 arch/x86/entry/common.c:178) 
[ 113.085652][ T1] ? __do_fast_syscall_32 (arch/x86/entry/common.c:183) 
Mounting Debug File System...
[ 113.085656][ T1] do_fast_syscall_32 (arch/x86/entry/common.c:203) 
[ 113.085660][ T1] entry_SYSENTER_compat_after_hwframe (arch/x86/entry/entry_64_compat.S:117) 
[  113.085669][    T1] RIP: 0023:0xf7f70549
[ 113.085673][ T1] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
All code
========
   0:	03 74 c0 01          	add    0x1(%rax,%rax,8),%esi
   4:	10 05 03 74 b8 01    	adc    %al,0x1b87403(%rip)        # 0x1b8740d
   a:	10 06                	adc    %al,(%rsi)
   c:	03 74 b4 01          	add    0x1(%rsp,%rsi,4),%esi
  10:	10 07                	adc    %al,(%rdi)
  12:	03 74 b0 01          	add    0x1(%rax,%rsi,4),%esi
  16:	10 08                	adc    %cl,(%rax)
  18:	03 74 d8 01          	add    0x1(%rax,%rbx,8),%esi
  1c:	00 00                	add    %al,(%rax)
  1e:	00 00                	add    %al,(%rax)
  20:	00 51 52             	add    %dl,0x52(%rcx)
  23:	55                   	push   %rbp
  24:	89 e5                	mov    %esp,%ebp
  26:	0f 34                	sysenter 
  28:	cd 80                	int    $0x80
  2a:*	5d                   	pop    %rbp		<-- trapping instruction
  2b:	5a                   	pop    %rdx
  2c:	59                   	pop    %rcx
  2d:	c3                   	retq   
  2e:	90                   	nop
  2f:	90                   	nop
  30:	90                   	nop
  31:	90                   	nop
  32:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi
  39:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi

Code starting with the faulting instruction
===========================================
   0:	5d                   	pop    %rbp
   1:	5a                   	pop    %rdx
   2:	59                   	pop    %rcx
   3:	c3                   	retq   
   4:	90                   	nop
   5:	90                   	nop
   6:	90                   	nop
   7:	90                   	nop
   8:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi
   f:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi


To reproduce:

        # build kernel
	cd linux
	cp config-5.19.0-rc1-00408-gb4813d591454 .config
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
	cd <mod-install-dir>
	find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.
Kuniyuki Iwashima June 20, 2022, 4:47 p.m. UTC | #2
From:   kernel test robot <oliver.sang@intel.com>
Date:   Mon, 20 Jun 2022 14:10:53 +0800
> Greeting,
> 
> FYI, we noticed the following commit (built with gcc-11):
> 
> commit: b4813d591454d771b5aaf33a6252b214648c430f ("[PATCH v1 net-next 4/6] af_unix: Acquire/Release per-netns hash table's locks.")
> url: https://github.com/intel-lab-lkp/linux/commits/Kuniyuki-Iwashima/af_unix-Introduce-per-netns-socket-hash-table/20220617-075046
> base: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git 5dcb50c009c9f8ec1cfca6a81a05c0060a5bbf68
> patch link: https://lore.kernel.org/netdev/20220616234714.4291-5-kuniyu@amazon.com
> 
> in testcase: boot
> 
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> 
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> 
> 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <oliver.sang@intel.com>
> 
> 
> [  113.085258][    T1] WARNING: possible recursive locking detected
> [  113.085261][    T1] 5.19.0-rc1-00408-gb4813d591454 #1 Not tainted
> [  113.085264][    T1] --------------------------------------------
> [  113.085265][    T1] systemd/1 is trying to acquire lock:
> [ 113.085270][ T1] ffff888167ee6c18 (&net->unx.hash[i].lock){+.+.}-{2:2}, at: unix_bind_bsd (net/unix/af_unix.c:1200) 
> [  113.085313][    T1]
> [  113.085313][    T1] but task is already holding lock:
> [ 113.085314][ T1] ffff888167ee0918 (&net->unx.hash[i].lock){+.+.}-{2:2}, at: unix_bind_bsd (net/unix/af_unix.c:175 net/unix/af_unix.c:1199) 
> [  113.085321][    T1]
> [  113.085321][    T1] other info that might help us debug this:
> [  113.085323][    T1]  Possible unsafe locking scenario:
> [  113.085323][    T1]
> [  113.085324][    T1]        CPU0
> [  113.085325][    T1]        ----
> [  113.085325][    T1]   lock(&net->unx.hash[i].lock);
> [  113.085328][    T1]   lock(&net->unx.hash[i].lock);
> [  113.085330][    T1]
> [  113.085330][    T1]  *** DEADLOCK ***
> [  113.085330][    T1]
> [  113.085331][    T1]  May be due to missing lock nesting notation

Sorry, I did a wrong copy-and-paste.
I'll use spin_lock_nested() in unix_table_double_lock().


> [  113.085331][    T1]
> [  113.085333][    T1] 6 locks held by systemd/1:
> [ 113.085335][ T1] #0: ffff88815da40448 (sb_writers#6){.+.+}-{0:0}, at: filename_create (fs/namei.c:3744) 
> [ 113.085351][ T1] #1: ffff88815bffec40 (&type->i_mutex_dir_key#4/1){+.+.}-{3:3}, at: filename_create (fs/namei.c:3747) 
> [  OK  ] Started Forward Password Requests to Wall Directory Watch.
> [  OK  ] Started Dispatch Password Requests to Console Directory Watch.
> [  OK  ] Reached target Paths.
> [  OK  ] Listening on udev Control Socket.
> [ 113.085359][ T1] #2: ffff88815d974e18 (&u->bindlock){+.+.}-{3:3}, at: unix_bind_bsd (net/unix/af_unix.c:1192) 
> [ 113.085370][ T1] #3: ffffffffb0eec038 (&unix_table_locks[i]){+.+.}-{2:2}, at: unix_bind_bsd (net/unix/af_unix.c:172 net/unix/af_unix.c:1199) 
> [ 113.085377][ T1] #4: ffffffffb0ef1838 (&unix_table_locks[i]/1){+.+.}-{2:2}, at: unix_bind_bsd (net/unix/af_unix.c:174 net/unix/af_unix.c:1199) 
> [ 113.085384][ T1] #5: ffff888167ee0918 (&net->unx.hash[i].lock){+.+.}-{2:2}, at: unix_bind_bsd (net/unix/af_unix.c:175 net/unix/af_unix.c:1199) 
> [  113.085391][    T1]
> [  113.085391][    T1] stack backtrace:
> [  113.085395][    T1] CPU: 1 PID: 1 Comm: systemd Not tainted 5.19.0-rc1-00408-gb4813d591454 #1
> [  113.085401][    T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> [  113.085408][    T1] Call Trace:
> [  113.085419][    T1]  <TASK>
> [ 113.085421][ T1] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
> [ 113.085453][ T1] validate_chain.cold (kernel/locking/lockdep.c:2988 kernel/locking/lockdep.c:3031 kernel/locking/lockdep.c:3816) 
> [ 113.085473][ T1] ? check_prev_add (kernel/locking/lockdep.c:3785) 
> [ 113.085483][ T1] ? rcu_read_unlock (include/linux/rcupdate.h:724 (discriminator 5)) 
> [ 113.085489][ T1] __lock_acquire (kernel/locking/lockdep.c:5053) 
> [  OK  ] Listening on Journal Socket (/dev/log).
> [  OK  ] Listening on Journal Socket.
> [  OK  ] Reached target Encrypted Volumes.
> [  OK  ] Listening on /dev/initctl Compatibility Named Pipe.
> [ 113.085497][ T1] ? rcu_read_unlock (include/linux/rcupdate.h:724 (discriminator 5)) 
> [ 113.085501][ T1] lock_acquire (kernel/locking/lockdep.c:466 kernel/locking/lockdep.c:5667 kernel/locking/lockdep.c:5630) 
> [ 113.085504][ T1] ? unix_bind_bsd (net/unix/af_unix.c:1200) 
> [ 113.085509][ T1] ? rcu_read_unlock (include/linux/rcupdate.h:724 (discriminator 5)) 
> [ 113.085513][ T1] ? do_raw_spin_lock (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:111 kernel/locking/spinlock_debug.c:115) 
> [ 113.085519][ T1] ? rwlock_bug+0xc0/0xc0 
> [  OK  ] Created slice User and Session Slice.
> [ 113.085524][ T1] _raw_spin_lock (include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154) 
> [ 113.085539][ T1] ? unix_bind_bsd (net/unix/af_unix.c:1200) 
> [ 113.085543][ T1] unix_bind_bsd (net/unix/af_unix.c:1200) 
> [ 113.085548][ T1] ? __might_fault (mm/memory.c:5566 mm/memory.c:5559) 
> [ 113.085557][ T1] ? unix_stream_sendmsg (net/unix/af_unix.c:1153) 
> [  OK  ] Created slice System Slice.
> [ 113.085560][ T1] ? lock_release (kernel/locking/lockdep.c:466 kernel/locking/lockdep.c:5687) 
> [ 113.085563][ T1] ? _copy_from_user (arch/x86/include/asm/uaccess_64.h:46 arch/x86/include/asm/uaccess_64.h:52 lib/usercopy.c:16) 
> [ 113.085580][ T1] __sys_bind (net/socket.c:1776) 
> [ 113.085589][ T1] ? __ia32_sys_socketpair (net/socket.c:1763) 
> [ 113.085592][ T1] ? __lock_release (kernel/locking/lockdep.c:5341) 
> [ 113.085597][ T1] ? lock_is_held_type (kernel/locking/lockdep.c:5406 kernel/locking/lockdep.c:5708) 
> [ 113.085606][ T1] ? __might_fault (mm/memory.c:5566 mm/memory.c:5559) 
> [ 113.085610][ T1] ? lock_release (kernel/locking/lockdep.c:466 kernel/locking/lockdep.c:5687) 
> [ 113.085614][ T1] __do_compat_sys_socketcall (net/compat.c:453) 
> [ 113.085627][ T1] ? __x64_sys_rmdir (fs/namei.c:4221) 
> [ 113.085631][ T1] ? __ia32_compat_sys_recvmmsg_time32 (net/compat.c:425) 
> [ 113.085637][ T1] ? syscall_exit_to_user_mode (kernel/entry/common.c:129 kernel/entry/common.c:296) 
> [ 113.085642][ T1] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526) 
> [ 113.085646][ T1] __do_fast_syscall_32 (arch/x86/entry/common.c:112 arch/x86/entry/common.c:178) 
> [ 113.085652][ T1] ? __do_fast_syscall_32 (arch/x86/entry/common.c:183) 
> Mounting Debug File System...
> [ 113.085656][ T1] do_fast_syscall_32 (arch/x86/entry/common.c:203) 
> [ 113.085660][ T1] entry_SYSENTER_compat_after_hwframe (arch/x86/entry/entry_64_compat.S:117) 
> [  113.085669][    T1] RIP: 0023:0xf7f70549
> [ 113.085673][ T1] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
> All code
> ========
>    0:	03 74 c0 01          	add    0x1(%rax,%rax,8),%esi
>    4:	10 05 03 74 b8 01    	adc    %al,0x1b87403(%rip)        # 0x1b8740d
>    a:	10 06                	adc    %al,(%rsi)
>    c:	03 74 b4 01          	add    0x1(%rsp,%rsi,4),%esi
>   10:	10 07                	adc    %al,(%rdi)
>   12:	03 74 b0 01          	add    0x1(%rax,%rsi,4),%esi
>   16:	10 08                	adc    %cl,(%rax)
>   18:	03 74 d8 01          	add    0x1(%rax,%rbx,8),%esi
>   1c:	00 00                	add    %al,(%rax)
>   1e:	00 00                	add    %al,(%rax)
>   20:	00 51 52             	add    %dl,0x52(%rcx)
>   23:	55                   	push   %rbp
>   24:	89 e5                	mov    %esp,%ebp
>   26:	0f 34                	sysenter 
>   28:	cd 80                	int    $0x80
>   2a:*	5d                   	pop    %rbp		<-- trapping instruction
>   2b:	5a                   	pop    %rdx
>   2c:	59                   	pop    %rcx
>   2d:	c3                   	retq   
>   2e:	90                   	nop
>   2f:	90                   	nop
>   30:	90                   	nop
>   31:	90                   	nop
>   32:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi
>   39:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	5d                   	pop    %rbp
>    1:	5a                   	pop    %rdx
>    2:	59                   	pop    %rcx
>    3:	c3                   	retq   
>    4:	90                   	nop
>    5:	90                   	nop
>    6:	90                   	nop
>    7:	90                   	nop
>    8:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi
>    f:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi
> 
> 
> To reproduce:
> 
>         # build kernel
> 	cd linux
> 	cp config-5.19.0-rc1-00408-gb4813d591454 .config
> 	make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
> 	make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
> 	cd <mod-install-dir>
> 	find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
> 
> 
>         git clone https://github.com/intel/lkp-tests.git
>         cd lkp-tests
>         bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
> 
>         # if come across any failure that blocks the test,
>         # please remove ~/.lkp and /lkp dir to run from a clean state.
> 
> 
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://01.org/lkp
>
diff mbox series

Patch

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 3c07702e2349..ae21e3fb86da 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -158,7 +158,8 @@  static unsigned int unix_abstract_hash(struct sockaddr_un *sunaddr,
 	return hash & UNIX_HASH_MOD;
 }
 
-static void unix_table_double_lock(unsigned int hash1, unsigned int hash2)
+static void unix_table_double_lock(struct net *net,
+				   unsigned int hash1, unsigned int hash2)
 {
 	/* hash1 and hash2 is never the same because
 	 * one is between 0 and UNIX_HASH_MOD, and
@@ -169,10 +170,17 @@  static void unix_table_double_lock(unsigned int hash1, unsigned int hash2)
 
 	spin_lock(&unix_table_locks[hash1]);
 	spin_lock_nested(&unix_table_locks[hash2], SINGLE_DEPTH_NESTING);
+
+	spin_lock(&net->unx.hash[hash1].lock);
+	spin_lock(&net->unx.hash[hash2].lock);
 }
 
-static void unix_table_double_unlock(unsigned int hash1, unsigned int hash2)
+static void unix_table_double_unlock(struct net *net,
+				     unsigned int hash1, unsigned int hash2)
 {
+	spin_unlock(&net->unx.hash[hash1].lock);
+	spin_unlock(&net->unx.hash[hash2].lock);
+
 	spin_unlock(&unix_table_locks[hash1]);
 	spin_unlock(&unix_table_locks[hash2]);
 }
@@ -316,17 +324,21 @@  static void __unix_set_addr_hash(struct sock *sk, struct unix_address *addr,
 	__unix_insert_socket(sk);
 }
 
-static void unix_remove_socket(struct sock *sk)
+static void unix_remove_socket(struct net *net, struct sock *sk)
 {
 	spin_lock(&unix_table_locks[sk->sk_hash]);
+	spin_lock(&net->unx.hash[sk->sk_hash].lock);
 	__unix_remove_socket(sk);
+	spin_unlock(&net->unx.hash[sk->sk_hash].lock);
 	spin_unlock(&unix_table_locks[sk->sk_hash]);
 }
 
-static void unix_insert_unbound_socket(struct sock *sk)
+static void unix_insert_unbound_socket(struct net *net, struct sock *sk)
 {
 	spin_lock(&unix_table_locks[sk->sk_hash]);
+	spin_lock(&net->unx.hash[sk->sk_hash].lock);
 	__unix_insert_socket(sk);
+	spin_unlock(&net->unx.hash[sk->sk_hash].lock);
 	spin_unlock(&unix_table_locks[sk->sk_hash]);
 }
 
@@ -356,28 +368,33 @@  static inline struct sock *unix_find_socket_byname(struct net *net,
 	struct sock *s;
 
 	spin_lock(&unix_table_locks[hash]);
+	spin_lock(&net->unx.hash[hash].lock);
 	s = __unix_find_socket_byname(net, sunname, len, hash);
 	if (s)
 		sock_hold(s);
+	spin_unlock(&net->unx.hash[hash].lock);
 	spin_unlock(&unix_table_locks[hash]);
 	return s;
 }
 
-static struct sock *unix_find_socket_byinode(struct inode *i)
+static struct sock *unix_find_socket_byinode(struct net *net, struct inode *i)
 {
 	unsigned int hash = unix_bsd_hash(i);
 	struct sock *s;
 
 	spin_lock(&unix_table_locks[hash]);
+	spin_lock(&net->unx.hash[hash].lock);
 	sk_for_each(s, &unix_socket_table[hash]) {
 		struct dentry *dentry = unix_sk(s)->path.dentry;
 
 		if (dentry && d_backing_inode(dentry) == i) {
 			sock_hold(s);
+			spin_unlock(&net->unx.hash[hash].lock);
 			spin_unlock(&unix_table_locks[hash]);
 			return s;
 		}
 	}
+	spin_unlock(&net->unx.hash[hash].lock);
 	spin_unlock(&unix_table_locks[hash]);
 	return NULL;
 }
@@ -576,12 +593,12 @@  static void unix_sock_destructor(struct sock *sk)
 static void unix_release_sock(struct sock *sk, int embrion)
 {
 	struct unix_sock *u = unix_sk(sk);
-	struct path path;
 	struct sock *skpair;
 	struct sk_buff *skb;
+	struct path path;
 	int state;
 
-	unix_remove_socket(sk);
+	unix_remove_socket(sock_net(sk), sk);
 
 	/* Clear state */
 	unix_state_lock(sk);
@@ -930,7 +947,7 @@  static struct sock *unix_create1(struct net *net, struct socket *sock, int kern,
 	init_waitqueue_head(&u->peer_wait);
 	init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay);
 	memset(&u->scm_stat, 0, sizeof(struct scm_stat));
-	unix_insert_unbound_socket(sk);
+	unix_insert_unbound_socket(net, sk);
 
 	sock_prot_inuse_add(net, sk->sk_prot, 1);
 
@@ -1015,7 +1032,7 @@  static struct sock *unix_find_bsd(struct net *net, struct sockaddr_un *sunaddr,
 	if (!S_ISSOCK(inode->i_mode))
 		goto path_put;
 
-	sk = unix_find_socket_byinode(inode);
+	sk = unix_find_socket_byinode(net, inode);
 	if (!sk)
 		goto path_put;
 
@@ -1074,6 +1091,7 @@  static int unix_autobind(struct sock *sk)
 {
 	unsigned int new_hash, old_hash = sk->sk_hash;
 	struct unix_sock *u = unix_sk(sk);
+	struct net *net = sock_net(sk);
 	struct unix_address *addr;
 	u32 lastnum, ordernum;
 	int err;
@@ -1102,11 +1120,10 @@  static int unix_autobind(struct sock *sk)
 	sprintf(addr->name->sun_path + 1, "%05x", ordernum);
 
 	new_hash = unix_abstract_hash(addr->name, addr->len, sk->sk_type);
-	unix_table_double_lock(old_hash, new_hash);
+	unix_table_double_lock(net, old_hash, new_hash);
 
-	if (__unix_find_socket_byname(sock_net(sk), addr->name, addr->len,
-				      new_hash)) {
-		unix_table_double_unlock(old_hash, new_hash);
+	if (__unix_find_socket_byname(net, addr->name, addr->len, new_hash)) {
+		unix_table_double_unlock(net, old_hash, new_hash);
 
 		/* __unix_find_socket_byname() may take long time if many names
 		 * are already in use.
@@ -1124,7 +1141,7 @@  static int unix_autobind(struct sock *sk)
 	}
 
 	__unix_set_addr_hash(sk, addr, new_hash);
-	unix_table_double_unlock(old_hash, new_hash);
+	unix_table_double_unlock(net, old_hash, new_hash);
 	err = 0;
 
 out:	mutex_unlock(&u->bindlock);
@@ -1138,6 +1155,7 @@  static int unix_bind_bsd(struct sock *sk, struct sockaddr_un *sunaddr,
 	       (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask());
 	unsigned int new_hash, old_hash = sk->sk_hash;
 	struct unix_sock *u = unix_sk(sk);
+	struct net *net = sock_net(sk);
 	struct user_namespace *ns; // barf...
 	struct unix_address *addr;
 	struct dentry *dentry;
@@ -1178,11 +1196,11 @@  static int unix_bind_bsd(struct sock *sk, struct sockaddr_un *sunaddr,
 		goto out_unlock;
 
 	new_hash = unix_bsd_hash(d_backing_inode(dentry));
-	unix_table_double_lock(old_hash, new_hash);
+	unix_table_double_lock(net, old_hash, new_hash);
 	u->path.mnt = mntget(parent.mnt);
 	u->path.dentry = dget(dentry);
 	__unix_set_addr_hash(sk, addr, new_hash);
-	unix_table_double_unlock(old_hash, new_hash);
+	unix_table_double_unlock(net, old_hash, new_hash);
 	mutex_unlock(&u->bindlock);
 	done_path_create(&parent, dentry);
 	return 0;
@@ -1205,6 +1223,7 @@  static int unix_bind_abstract(struct sock *sk, struct sockaddr_un *sunaddr,
 {
 	unsigned int new_hash, old_hash = sk->sk_hash;
 	struct unix_sock *u = unix_sk(sk);
+	struct net *net = sock_net(sk);
 	struct unix_address *addr;
 	int err;
 
@@ -1222,19 +1241,18 @@  static int unix_bind_abstract(struct sock *sk, struct sockaddr_un *sunaddr,
 	}
 
 	new_hash = unix_abstract_hash(addr->name, addr->len, sk->sk_type);
-	unix_table_double_lock(old_hash, new_hash);
+	unix_table_double_lock(net, old_hash, new_hash);
 
-	if (__unix_find_socket_byname(sock_net(sk), addr->name, addr->len,
-				      new_hash))
+	if (__unix_find_socket_byname(net, addr->name, addr->len, new_hash))
 		goto out_spin;
 
 	__unix_set_addr_hash(sk, addr, new_hash);
-	unix_table_double_unlock(old_hash, new_hash);
+	unix_table_double_unlock(net, old_hash, new_hash);
 	mutex_unlock(&u->bindlock);
 	return 0;
 
 out_spin:
-	unix_table_double_unlock(old_hash, new_hash);
+	unix_table_double_unlock(net, old_hash, new_hash);
 	err = -EADDRINUSE;
 out_mutex:
 	mutex_unlock(&u->bindlock);
@@ -3237,15 +3255,18 @@  static struct sock *unix_from_bucket(struct seq_file *seq, loff_t *pos)
 static struct sock *unix_get_first(struct seq_file *seq, loff_t *pos)
 {
 	unsigned long bucket = get_bucket(*pos);
+	struct net *net = seq_file_net(seq);
 	struct sock *sk;
 
 	while (bucket < UNIX_HASH_SIZE) {
 		spin_lock(&unix_table_locks[bucket]);
+		spin_lock(&net->unx.hash[bucket].lock);
 
 		sk = unix_from_bucket(seq, pos);
 		if (sk)
 			return sk;
 
+		spin_unlock(&net->unx.hash[bucket].lock);
 		spin_unlock(&unix_table_locks[bucket]);
 
 		*pos = set_bucket_offset(++bucket, 1);
@@ -3258,11 +3279,13 @@  static struct sock *unix_get_next(struct seq_file *seq, struct sock *sk,
 				  loff_t *pos)
 {
 	unsigned long bucket = get_bucket(*pos);
+	struct net *net = seq_file_net(seq);
 
 	for (sk = sk_next(sk); sk; sk = sk_next(sk))
-		if (sock_net(sk) == seq_file_net(seq))
+		if (sock_net(sk) == net)
 			return sk;
 
+	spin_unlock(&net->unx.hash[bucket].lock);
 	spin_unlock(&unix_table_locks[bucket]);
 
 	*pos = set_bucket_offset(++bucket, 1);
@@ -3292,8 +3315,10 @@  static void unix_seq_stop(struct seq_file *seq, void *v)
 {
 	struct sock *sk = v;
 
-	if (sk)
+	if (sk) {
+		spin_unlock(&seq_file_net(seq)->unx.hash[sk->sk_hash].lock);
 		spin_unlock(&unix_table_locks[sk->sk_hash]);
+	}
 }
 
 static int unix_seq_show(struct seq_file *seq, void *v)
@@ -3381,6 +3406,7 @@  static int bpf_iter_unix_hold_batch(struct seq_file *seq, struct sock *start_sk)
 
 {
 	struct bpf_unix_iter_state *iter = seq->private;
+	struct net *net = seq_file_net(seq);
 	unsigned int expected = 1;
 	struct sock *sk;
 
@@ -3388,7 +3414,7 @@  static int bpf_iter_unix_hold_batch(struct seq_file *seq, struct sock *start_sk)
 	iter->batch[iter->end_sk++] = start_sk;
 
 	for (sk = sk_next(start_sk); sk; sk = sk_next(sk)) {
-		if (sock_net(sk) != seq_file_net(seq))
+		if (sock_net(sk) != net)
 			continue;
 
 		if (iter->end_sk < iter->max_sk) {
@@ -3399,6 +3425,7 @@  static int bpf_iter_unix_hold_batch(struct seq_file *seq, struct sock *start_sk)
 		expected++;
 	}
 
+	spin_unlock(&net->unx.hash[start_sk->sk_hash].lock);
 	spin_unlock(&unix_table_locks[start_sk->sk_hash]);
 
 	return expected;
diff --git a/net/unix/diag.c b/net/unix/diag.c
index c5d1cca72aa5..41b67b82f51f 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -195,9 +195,9 @@  static int sk_diag_dump(struct sock *sk, struct sk_buff *skb, struct unix_diag_r
 
 static int unix_diag_dump(struct sk_buff *skb, struct netlink_callback *cb)
 {
-	struct unix_diag_req *req;
-	int num, s_num, slot, s_slot;
 	struct net *net = sock_net(skb->sk);
+	int num, s_num, slot, s_slot;
+	struct unix_diag_req *req;
 
 	req = nlmsg_data(cb->nlh);
 
@@ -209,6 +209,7 @@  static int unix_diag_dump(struct sk_buff *skb, struct netlink_callback *cb)
 
 		num = 0;
 		spin_lock(&unix_table_locks[slot]);
+		spin_lock(&net->unx.hash[slot].lock);
 		sk_for_each(sk, &unix_socket_table[slot]) {
 			if (!net_eq(sock_net(sk), net))
 				continue;
@@ -220,12 +221,14 @@  static int unix_diag_dump(struct sk_buff *skb, struct netlink_callback *cb)
 					 NETLINK_CB(cb->skb).portid,
 					 cb->nlh->nlmsg_seq,
 					 NLM_F_MULTI) < 0) {
+				spin_unlock(&net->unx.hash[slot].lock);
 				spin_unlock(&unix_table_locks[slot]);
 				goto done;
 			}
 next:
 			num++;
 		}
+		spin_unlock(&net->unx.hash[slot].lock);
 		spin_unlock(&unix_table_locks[slot]);
 	}
 done:
@@ -235,19 +238,22 @@  static int unix_diag_dump(struct sk_buff *skb, struct netlink_callback *cb)
 	return skb->len;
 }
 
-static struct sock *unix_lookup_by_ino(unsigned int ino)
+static struct sock *unix_lookup_by_ino(struct net *net, unsigned int ino)
 {
 	struct sock *sk;
 	int i;
 
 	for (i = 0; i < UNIX_HASH_SIZE; i++) {
 		spin_lock(&unix_table_locks[i]);
+		spin_lock(&net->unx.hash[i].lock);
 		sk_for_each(sk, &unix_socket_table[i])
 			if (ino == sock_i_ino(sk)) {
 				sock_hold(sk);
+				spin_unlock(&net->unx.hash[i].lock);
 				spin_unlock(&unix_table_locks[i]);
 				return sk;
 			}
+		spin_unlock(&net->unx.hash[i].lock);
 		spin_unlock(&unix_table_locks[i]);
 	}
 	return NULL;
@@ -257,16 +263,17 @@  static int unix_diag_get_exact(struct sk_buff *in_skb,
 			       const struct nlmsghdr *nlh,
 			       struct unix_diag_req *req)
 {
-	int err = -EINVAL;
-	struct sock *sk;
-	struct sk_buff *rep;
-	unsigned int extra_len;
 	struct net *net = sock_net(in_skb->sk);
+	unsigned int extra_len;
+	struct sk_buff *rep;
+	struct sock *sk;
+	int err;
 
+	err = -EINVAL;
 	if (req->udiag_ino == 0)
 		goto out_nosk;
 
-	sk = unix_lookup_by_ino(req->udiag_ino);
+	sk = unix_lookup_by_ino(net, req->udiag_ino);
 	err = -ENOENT;
 	if (sk == NULL)
 		goto out_nosk;