mbox series

[net,0/2] af_unix: Garbage collector vs connect() race condition

Message ID 20240408161336.612064-1-mhal@rbox.co (mailing list archive)
Headers show
Series af_unix: Garbage collector vs connect() race condition | expand

Message

Michal Luczaj April 8, 2024, 3:58 p.m. UTC
Garbage collector does not take into account the risk of embryo getting
enqueued during the garbage collection. If such embryo has a peer that
carries SCM_RIGHTS, two consecutive passes of scan_children() may see a
different set of children. Leading to an incorrectly elevated inflight
count, and then a dangling pointer within the gc_inflight_list.

sockets are AF_UNIX/SOCK_STREAM
S is an unconnected socket
L is a listening in-flight socket bound to addr, not in fdtable
V's fd will be passed via sendmsg(), gets inflight count bumped

connect(S, addr)	sendmsg(S, [V]); close(V)	__unix_gc()
----------------	-------------------------	-----------

NS = unix_create1()
skb1 = sock_wmalloc(NS)
L = unix_find_other(addr)
unix_state_lock(L)
unix_peer(S) = NS
			// V count=1 inflight=0

 			NS = unix_peer(S)
 			skb2 = sock_alloc()
			skb_queue_tail(NS, skb2[V])

			// V became in-flight
			// V count=2 inflight=1

			close(V)

			// V count=1 inflight=1
			// GC candidate condition met

						for u in gc_inflight_list:
						  if (total_refs == inflight_refs)
						    add u to gc_candidates

						// gc_candidates={L, V}

						for u in gc_candidates:
						  scan_children(u, dec_inflight)

						// embryo (skb1) was not
						// reachable from L yet, so V's
						// inflight remains unchanged
__skb_queue_tail(L, skb1)
unix_state_unlock(L)
						for u in gc_candidates:
						  if (u.inflight)
						    scan_children(u, inc_inflight_move_tail)

						// V count=1 inflight=2 (!)

The idea behind the patch is to unix_state_lock()/unlock() L during the
gc_candidates selection. That would guarantee either connect() has
concluded, or - if we raced it - parallel sendmsg() with SCM_RIGHTS could
not happen as we are already holding the unix_gc_lock.

Running the reproducer with mdelay(1) stuffed in unix_stream_connect()
results in immediate splats:

$ ./tools/testing/selftests/net/af_unix/gc_vs_connect
running
[   47.019387] WARNING: CPU: 3 PID: 12 at net/unix/garbage.c:284 __unix_gc+0x473/0x4a0
[   47.019405] Modules linked in: 9p netfs kvm_intel kvm 9pnet_virtio 9pnet i2c_piix4 zram crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_blk serio_raw fuse qemu_fw_cfg virtio_console
[   47.019419] CPU: 3 PID: 12 Comm: kworker/u32:1 Not tainted 6.9.0-rc2nokasan+ #1
[   47.019421] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[   47.019422] Workqueue: events_unbound __unix_gc
[   47.019424] RIP: 0010:__unix_gc+0x473/0x4a0
[   47.019425] Code: 8d bb d0 f9 ff ff 48 0f ba 73 58 01 0f b6 83 e2 f9 ff ff 31 d2 48 c7 c6 a0 c3 de 81 3c 0a 74 18 e8 e2 f8 ff ff e9 96 fd ff ff <0f> 0b e9 ef fb ff ff 0f 0b e9 39 fc ff ff e8 5a fa ff ff e9 7e fd
[   47.019427] RSP: 0018:ffffc9000006bd90 EFLAGS: 00010297
[   47.019429] RAX: 0000000000000002 RBX: ffff888109459680 RCX: 000000003342a6b0
[   47.019430] RDX: 0000000000000001 RSI: ffffffff82634e8c RDI: ffffffff83161740
[   47.019431] RBP: ffffc9000006be40 R08: 00000000000003d7 R09: 00000000000003d7
[   47.019432] R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff831610e0
[   47.019433] R13: ffff888109459cb0 R14: ffffc9000006bd90 R15: ffffffff831616c0
[   47.019434] FS:  0000000000000000(0000) GS:ffff88842fb80000(0000) knlGS:0000000000000000
[   47.019435] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   47.019436] CR2: 000055c16e3893b8 CR3: 0000000002e37000 CR4: 0000000000750ef0
[   47.019438] PKRU: 55555554
[   47.019439] Call Trace:
[   47.019440]  <TASK>
[   47.019442]  ? __warn+0x88/0x180
[   47.019445]  ? __unix_gc+0x473/0x4a0
[   47.019447]  ? report_bug+0x189/0x1c0
[   47.019451]  ? handle_bug+0x38/0x70
[   47.019453]  ? exc_invalid_op+0x13/0x60
[   47.019454]  ? asm_exc_invalid_op+0x16/0x20
[   47.019460]  ? __unix_gc+0x473/0x4a0
[   47.019464]  ? lock_acquire+0xd5/0x2c0
[   47.019466]  ? lock_release+0x133/0x290
[   47.019471]  process_one_work+0x215/0x700
[   47.019473]  ? move_linked_works+0x70/0xa0
[   47.019477]  worker_thread+0x1ca/0x3b0
[   47.019479]  ? rescuer_thread+0x340/0x340
[   47.019480]  kthread+0xdd/0x110
[   47.019482]  ? kthread_complete_and_exit+0x20/0x20
[   47.019485]  ret_from_fork+0x2d/0x50
[   47.019487]  ? kthread_complete_and_exit+0x20/0x20
[   47.019489]  ret_from_fork_asm+0x11/0x20
[   47.019495]  </TASK>
[   47.019496] irq event stamp: 446769
[   47.019497] hardirqs last  enabled at (446775): [<ffffffff81199a69>] console_unlock+0xf9/0x120
[   47.019499] hardirqs last disabled at (446780): [<ffffffff81199a4e>] console_unlock+0xde/0x120
[   47.019501] softirqs last  enabled at (444340): [<ffffffff810fcd73>] __irq_exit_rcu+0x93/0x100
[   47.019504] softirqs last disabled at (444333): [<ffffffff810fcd73>] __irq_exit_rcu+0x93/0x100
[   47.019505] ---[ end trace 0000000000000000 ]---
...
[   47.019555] list_add corruption. prev->next should be next (ffffffff83161710), but was ffff888109459cb0. (prev=ffff888109459cb0).
[   47.019572] kernel BUG at lib/list_debug.c:32!
[   47.019654] invalid opcode: 0000 [#1] PREEMPT SMP
[   47.019676] CPU: 0 PID: 1057 Comm: gc_vs_connect Tainted: G        W          6.9.0-rc2nokasan+ #1
[   47.019698] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[   47.019720] RIP: 0010:__list_add_valid_or_report+0x70/0x90
[   47.019743] Code: 98 ff 0f 0b 48 89 c1 48 c7 c7 90 1a 72 82 e8 77 ed 98 ff 0f 0b 48 89 d1 48 89 c6 4c 89 c2 48 c7 c7 e8 1a 72 82 e8 60 ed 98 ff <0f> 0b 48 89 f2 48 89 c1 48 89 fe 48 c7 c7 40 1b 72 82 e8 49 ed 98
[   47.019766] RSP: 0018:ffffc9000125fba8 EFLAGS: 00010286
[   47.019790] RAX: 0000000000000075 RBX: ffff888109459680 RCX: 0000000000000000
[   47.019814] RDX: 0000000000000002 RSI: ffffffff82667570 RDI: 00000000ffffffff
[   47.019838] RBP: ffff8881067bc400 R08: 0000000000000000 R09: 0000000000000003
[   47.019861] R10: ffffc9000125fa78 R11: ffffffff82f571e8 R12: ffff888109459cb0
[   47.019883] R13: ffff888109459cb0 R14: ffff88810945ad00 R15: 0000000000000000
[   47.019906] FS:  00007fba6fde1680(0000) GS:ffff88842fa00000(0000) knlGS:0000000000000000
[   47.019928] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   47.019949] CR2: 00007fba6fc5ac80 CR3: 0000000104819000 CR4: 0000000000750ef0
[   47.019972] PKRU: 55555554
[   47.019993] Call Trace:
[   47.020014]  <TASK>
[   47.020034]  ? die+0x32/0x80
[   47.020056]  ? do_trap+0xd5/0x100
[   47.020078]  ? __list_add_valid_or_report+0x70/0x90
[   47.020100]  ? __list_add_valid_or_report+0x70/0x90
[   47.020122]  ? do_error_trap+0x81/0x110
[   47.020144]  ? __list_add_valid_or_report+0x70/0x90
[   47.020166]  ? exc_invalid_op+0x4c/0x60
[   47.020189]  ? __list_add_valid_or_report+0x70/0x90
[   47.020211]  ? asm_exc_invalid_op+0x16/0x20
[   47.020237]  ? __list_add_valid_or_report+0x70/0x90
[   47.020355]  ? __list_add_valid_or_report+0x70/0x90
[   47.020424]  unix_inflight+0x6a/0xf0
[   47.020483]  unix_scm_to_skb+0xe4/0x160
[   47.020518]  unix_stream_sendmsg+0x174/0x630
[   47.020573]  __sock_sendmsg+0x38/0x70
[   47.020635]  ____sys_sendmsg+0x237/0x2a0
[   47.020694]  ? import_iovec+0x16/0x20
[   47.020754]  ___sys_sendmsg+0x86/0xd0
[   47.020791]  ? find_held_lock+0x2b/0x80
[   47.021011]  ? lock_release+0x133/0x290
[   47.021467]  ? __fget_files+0xca/0x180
[   47.021564]  __sys_sendmsg+0x47/0x80
[   47.021663]  do_syscall_64+0x94/0x180
[   47.021751]  ? do_syscall_64+0xa1/0x180
[   47.021812]  ? do_syscall_64+0xa1/0x180
[   47.021864]  entry_SYSCALL_64_after_hwframe+0x46/0x4e
[   47.021890] RIP: 0033:0x7fba6fd15dbb
[   47.021917] Code: 48 89 e5 48 83 ec 20 89 55 ec 48 89 75 f0 89 7d f8 e8 69 2c f7 ff 8b 55 ec 48 8b 75 f0 41 89 c0 8b 7d f8 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2d 44 89 c7 48 89 45 f8 e8 c1 2c f7 ff 48 8b
[   47.021948] RSP: 002b:00007ffc1aa77410 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[   47.021974] RAX: ffffffffffffffda RBX: 00007ffc1aa775c8 RCX: 00007fba6fd15dbb
[   47.022000] RDX: 0000000000000000 RSI: 00005625cb5df0e0 RDI: 0000000000000003
[   47.022025] RBP: 00007ffc1aa77430 R08: 0000000000000000 R09: 0000000000001033
[   47.022050] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000001
[   47.022076] R13: 0000000000000000 R14: 00007fba6fe2c000 R15: 00005625cb5dedd8
[   47.022103]  </TASK>
[   47.022127] Modules linked in: 9p netfs kvm_intel kvm 9pnet_virtio 9pnet i2c_piix4 zram crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_blk serio_raw fuse qemu_fw_cfg virtio_console
[   47.022209] ---[ end trace 0000000000000000 ]---
[   47.022233] RIP: 0010:__list_add_valid_or_report+0x70/0x90
[   47.022258] Code: 98 ff 0f 0b 48 89 c1 48 c7 c7 90 1a 72 82 e8 77 ed 98 ff 0f 0b 48 89 d1 48 89 c6 4c 89 c2 48 c7 c7 e8 1a 72 82 e8 60 ed 98 ff <0f> 0b 48 89 f2 48 89 c1 48 89 fe 48 c7 c7 40 1b 72 82 e8 49 ed 98
[   47.022284] RSP: 0018:ffffc9000125fba8 EFLAGS: 00010286
[   47.022308] RAX: 0000000000000075 RBX: ffff888109459680 RCX: 0000000000000000
[   47.022332] RDX: 0000000000000002 RSI: ffffffff82667570 RDI: 00000000ffffffff
[   47.022356] RBP: ffff8881067bc400 R08: 0000000000000000 R09: 0000000000000003
[   47.022406] R10: ffffc9000125fa78 R11: ffffffff82f571e8 R12: ffff888109459cb0
[   47.022451] R13: ffff888109459cb0 R14: ffff88810945ad00 R15: 0000000000000000
[   47.022517] FS:  00007fba6fde1680(0000) GS:ffff88842fa00000(0000) knlGS:0000000000000000
[   47.022606] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   47.022653] CR2: 00007fba6fc5ac80 CR3: 0000000104819000 CR4: 0000000000750ef0
[   47.022698] PKRU: 55555554
[   47.022764] note: gc_vs_connect[1057] exited with preempt_count 1

Michal Luczaj (2):
  af_unix: Fix garbage collector racing against connect()
  af_unix: Add GC race reproducer + slow down unix_stream_connect()

 net/unix/af_unix.c                            |   2 +
 net/unix/garbage.c                            |  20 ++-
 tools/testing/selftests/net/af_unix/Makefile  |   2 +-
 .../selftests/net/af_unix/gc_vs_connect.c     | 158 ++++++++++++++++++
 4 files changed, 178 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/net/af_unix/gc_vs_connect.c