Message ID | 1689248004-8158-1-git-send-email-akaher@vmware.com (mailing list archive) |
---|---|
Headers | show |
Series | tracing: introducing eventfs | expand |
On Thu, 13 Jul 2023 17:03:14 +0530 Ajay Kaher <akaher@vmware.com> wrote: > Events Tracing infrastructure contains lot of files, directories > (internally in terms of inodes, dentries). And ends up by consuming > memory in MBs. We can have multiple events of Events Tracing, which > further requires more memory. > > Instead of creating inodes/dentries, eventfs could keep meta-data and > skip the creation of inodes/dentries. As and when require, eventfs will > create the inodes/dentries only for required files/directories. > Also eventfs would delete the inodes/dentries once no more requires > but preserve the meta data. > > Tracing events took ~9MB, with this approach it took ~4.5MB > for ~10K files/dir. I think we are very close to getting this in for the next merge window. I ran several tests and so far it's holding up! I made a bunch of nits for this series, but nothing major. Mostly fixing up change logs and comments, as well as some naming conventions and reorganizing the series a little bit. Anyway, I'm hoping that v5 will be ready to go into linux-next. Thanks a lot Ajay for working on this! -- Steve > > v3: > Patch 3,4,5,7,9: > removed all the eventfs_rwsem code and replaced it with an srcu > lock for the readers, and a mutex to synchronize the writers of > the list. > > Patch 2: moved 'tracefs_inode' and 'get_tracefs()' to v4 03/10 > > Patch 3: moved the struct eventfs_file and eventfs_inode into event_inode.c > as it really should not be exposed to all users. > > Patch 5: added a recursion check to eventfs_remove_rec() as it is really > dangerous to have unchecked recursion in the kernel (we do have > a fixed size stack). > > have the free use srcu callbacks. After the srcu grace periods > are done, it adds the eventfs_file onto a llist (lockless link > list) and wakes up a work queue. Then the work queue does the > freeing (this needs to be done in task/workqueue context, as > srcu callbacks are done in softirq context). > > Patch 6: renamed: > eventfs_create_file() -> create_file() > eventfs_create_dir() -> create_dir() > > v2: > Patch 01: new patch:'Require all trace events to have a TRACE_SYSTEM' > Patch 02: moved from v1 1/9 > Patch 03: moved from v1 2/9 > As suggested by Zheng Yejian, introduced eventfs_prepare_ef() > helper function to add files or directories to eventfs > fix WARNING reported by kernel test robot in v1 8/9 > Patch 04: moved from v1 3/9 > used eventfs_prepare_ef() to add files > fix WARNING reported by kernel test robot in v1 8/9 > Patch 05: moved from v1 4/9 > fix compiling warning reported by kernel test robot in v1 4/9 > Patch 06: moved from v1 5/9 > Patch 07: moved from v1 6/9 > Patch 08: moved from v1 7/9 > Patch 09: moved from v1 8/9 > rebased because of v3 01/10 > Patch 10: moved from v1 9/9 > > v1: > Patch 1: add header file > Patch 2: resolved kernel test robot issues > protecting eventfs lists using nested eventfs_rwsem > Patch 3: protecting eventfs lists using nested eventfs_rwsem > Patch 4: improve events cleanup code to fix crashes > Patch 5: resolved kernel test robot issues > removed d_instantiate_anon() calls > Patch 6: resolved kernel test robot issues > fix kprobe test in eventfs_root_lookup() > protecting eventfs lists using nested eventfs_rwsem > Patch 7: remove header file > Patch 8: pass eventfs_rwsem as argument to eventfs functions > called eventfs_remove_events_dir() instead of tracefs_remove() > from event_trace_del_tracer() > Patch 9: new patch to fix kprobe test case > > Ajay Kaher (9): > tracefs: Rename some tracefs function > eventfs: Implement eventfs dir creation functions > eventfs: Implement eventfs file add functions > eventfs: Implement eventfs file, directory remove function > eventfs: Implement functions to create eventfs files and directories > eventfs: Implement eventfs lookup, read, open functions > eventfs: Implement tracefs_inode_cache > eventfs: Move tracing/events to eventfs > test: ftrace: Fix kprobe test for eventfs > > Steven Rostedt (Google) (1): > tracing: Require all trace events to have a TRACE_SYSTEM > > fs/tracefs/Makefile | 1 + > fs/tracefs/event_inode.c | 711 ++++++++++++++++++ > fs/tracefs/inode.c | 124 ++- > fs/tracefs/internal.h | 25 + > include/linux/trace_events.h | 1 + > include/linux/tracefs.h | 32 + > kernel/trace/trace.h | 2 +- > kernel/trace/trace_events.c | 78 +- > .../ftrace/test.d/kprobe/kprobe_args_char.tc | 4 +- > .../test.d/kprobe/kprobe_args_string.tc | 4 +- > 10 files changed, 930 insertions(+), 52 deletions(-) > create mode 100644 fs/tracefs/event_inode.c > create mode 100644 fs/tracefs/internal.h >
> On 15-Jul-2023, at 4:28 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > > !! External Email > > On Thu, 13 Jul 2023 17:03:14 +0530 > Ajay Kaher <akaher@vmware.com> wrote: > >> Events Tracing infrastructure contains lot of files, directories >> (internally in terms of inodes, dentries). And ends up by consuming >> memory in MBs. We can have multiple events of Events Tracing, which >> further requires more memory. >> >> Instead of creating inodes/dentries, eventfs could keep meta-data and >> skip the creation of inodes/dentries. As and when require, eventfs will >> create the inodes/dentries only for required files/directories. >> Also eventfs would delete the inodes/dentries once no more requires >> but preserve the meta data. >> >> Tracing events took ~9MB, with this approach it took ~4.5MB >> for ~10K files/dir. > > I think we are very close to getting this in for the next merge window. I > ran several tests and so far it's holding up! > > I made a bunch of nits for this series, but nothing major. Mostly fixing up > change logs and comments, as well as some naming conventions and > reorganizing the series a little bit. > > Anyway, I'm hoping that v5 will be ready to go into linux-next. > > Thanks a lot Ajay for working on this! Thanks Steve, hopefully I will fix all the pending nits in v5. Here is the checkpatch.pl report: ./scripts/checkpatch.pl v4/* -------------------------- v4/0000-cover-letter.patch -------------------------- total: 0 errors, 0 warnings, 0 lines checked v4/0000-cover-letter.patch has no obvious style problems and is ready for submission. ------------------------------------------------------------------ v4/0001-tracing-Require-all-trace-events-to-have-a-TRACE_SYS.patch ------------------------------------------------------------------ total: 0 errors, 0 warnings, 22 lines checked v4/0001-tracing-Require-all-trace-events-to-have-a-TRACE_SYS.patch has no obvious style problems and is ready for submission. -------------------------------------------------- v4/0002-tracefs-Rename-some-tracefs-function.patch -------------------------------------------------- total: 0 errors, 0 warnings, 71 lines checked v4/0002-tracefs-Rename-some-tracefs-function.patch has no obvious style problems and is ready for submission. -------------------------------------------------------------- v4/0003-eventfs-Implement-eventfs-dir-creation-functions.patch -------------------------------------------------------------- WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? #52: new file mode 100644 WARNING: Symbolic permissions 'S_IRWXU | S_IRUGO | S_IXUGO' are not preferred. Consider using octal permissions '0755'. #194: FILE: fs/tracefs/event_inode.c:138: + inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO; WARNING: Symbolic permissions 'S_IRWXU | S_IRUGO | S_IXUGO' are not preferred. Consider using octal permissions '0755'. #229: FILE: fs/tracefs/event_inode.c:173: + S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO, WARNING: Symbolic permissions 'S_IRWXU | S_IRUGO | S_IXUGO' are not preferred. Consider using octal permissions '0755'. #261: FILE: fs/tracefs/event_inode.c:205: + S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO, total: 0 errors, 4 warnings, 297 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. v4/0003-eventfs-Implement-eventfs-dir-creation-functions.patch has style problems, please review. ---------------------------------------------------------- v4/0004-eventfs-Implement-eventfs-file-add-functions.patch ---------------------------------------------------------- total: 0 errors, 0 warnings, 101 lines checked v4/0004-eventfs-Implement-eventfs-file-add-functions.patch has no obvious style problems and is ready for submission. ------------------------------------------------------------------ v4/0005-eventfs-Implement-eventfs-file-directory-remove-func.patch ------------------------------------------------------------------ total: 0 errors, 0 warnings, 129 lines checked v4/0005-eventfs-Implement-eventfs-file-directory-remove-func.patch has no obvious style problems and is ready for submission. ------------------------------------------------------------------ v4/0006-eventfs-Implement-functions-to-create-eventfs-files-.patch ------------------------------------------------------------------ total: 0 errors, 0 warnings, 182 lines checked v4/0006-eventfs-Implement-functions-to-create-eventfs-files-.patch has no obvious style problems and is ready for submission. ------------------------------------------------------------------ v4/0007-eventfs-Implement-eventfs-lookup-read-open-functions.patch ------------------------------------------------------------------ total: 0 errors, 0 warnings, 224 lines checked v4/0007-eventfs-Implement-eventfs-lookup-read-open-functions.patch has no obvious style problems and is ready for submission. --------------------------------------------------- v4/0008-eventfs-Implement-tracefs_inode_cache.patch --------------------------------------------------- total: 0 errors, 0 warnings, 68 lines checked v4/0008-eventfs-Implement-tracefs_inode_cache.patch has no obvious style problems and is ready for submission. ---------------------------------------------------- v4/0009-eventfs-Move-tracing-events-to-eventfs.patch ---------------------------------------------------- total: 0 errors, 0 warnings, 241 lines checked v4/0009-eventfs-Move-tracing-events-to-eventfs.patch has no obvious style problems and is ready for submission. ----------------------------------------------------- v4/0010-test-ftrace-Fix-kprobe-test-for-eventfs.patch ----------------------------------------------------- total: 0 errors, 0 warnings, 32 lines checked v4/0010-test-ftrace-Fix-kprobe-test-for-eventfs.patch has no obvious style problems and is ready for submission. -Ajay
On Sun, 16 Jul 2023 17:32:35 +0000 Ajay Kaher <akaher@vmware.com> wrote: > Thanks Steve, hopefully I will fix all the pending nits in v5. > Here is the checkpatch.pl report: Hold off on v5. I hit the following on v4: [ 220.170527] BUG: unable to handle page fault for address: fffffffffffffff0 [ 220.172792] #PF: supervisor read access in kernel mode [ 220.174618] #PF: error_code(0x0000) - not-present page [ 220.176516] PGD 13104d067 P4D 13104d067 PUD 13104f067 PMD 0 [ 220.178559] Oops: 0000 [#1] PREEMPT SMP PTI [ 220.180087] CPU: 3 PID: 35 Comm: kworker/u8:1 Not tainted 6.5.0-rc1-test-00021-gdd6e7af33766-dirty #15 [ 220.183441] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 220.186629] Workqueue: events_unbound eventfs_workfn [ 220.188286] RIP: 0010:eventfs_set_ef_status_free+0x17/0x40 [ 220.190091] Code: 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 48 8b 47 18 48 8b 40 30 48 83 f8 10 74 1b <f6> 40 f0 02 74 15 48 8b 47 78 48 85 c0 74 0c c6 40 5a 00 48 c7 40 [ 220.195360] RSP: 0018:ffffa731c0147e20 EFLAGS: 00010287 [ 220.196802] RAX: 0000000000000000 RBX: ffff97ca512ca000 RCX: 0000000000000000 [ 220.198703] RDX: 0000000000000001 RSI: ffff97ca52d18010 RDI: ffff97ca512ca000 [ 220.200540] RBP: ffff97ca52cb3780 R08: 0000000000000064 R09: 00000000802a0022 [ 220.202324] R10: 0000000000039e80 R11: ffff97cabffd5000 R12: ffff97ca512ca058 [ 220.204012] R13: ffff97ca52cb3780 R14: ffff97ca40153705 R15: ffffffffad5c1848 [ 220.205685] FS: 0000000000000000(0000) GS:ffff97cabbd80000(0000) knlGS:0000000000000000 [ 220.207476] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 220.208764] CR2: fffffffffffffff0 CR3: 000000010a01a001 CR4: 0000000000170ee0 [ 220.210342] Call Trace: [ 220.210879] <TASK> [ 220.211359] ? __die+0x23/0x70 [ 220.212036] ? page_fault_oops+0xa4/0x180 [ 220.212904] ? exc_page_fault+0xf6/0x190 [ 220.213738] ? asm_exc_page_fault+0x26/0x30 [ 220.214586] ? eventfs_set_ef_status_free+0x17/0x40 [ 220.216081] tracefs_dentry_iput+0x39/0x50 [ 220.217370] __dentry_kill+0xdc/0x170 [ 220.218581] dput+0x142/0x310 [ 220.219647] eventfs_workfn+0x42/0x70 [ 220.220805] process_one_work+0x1e2/0x3e0 [ 220.222031] worker_thread+0x1da/0x390 [ 220.223204] ? __pfx_worker_thread+0x10/0x10 [ 220.224476] kthread+0xf7/0x130 [ 220.225543] ? __pfx_kthread+0x10/0x10 [ 220.226735] ret_from_fork+0x2c/0x50 [ 220.227898] </TASK> [ 220.228792] Modules linked in: [ 220.229860] CR2: fffffffffffffff0 [ 220.230960] ---[ end trace 0000000000000000 ]--- I think I know the issue, and looking to see if I can fix it. -- Steve
> On 18-Jul-2023, at 7:10 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > > !! External Email > > On Sun, 16 Jul 2023 17:32:35 +0000 > Ajay Kaher <akaher@vmware.com> wrote: > >> Thanks Steve, hopefully I will fix all the pending nits in v5. >> Here is the checkpatch.pl report: > > Hold off on v5. I hit the following on v4: OK. > > [ 220.170527] BUG: unable to handle page fault for address: fffffffffffffff0 > [ 220.172792] #PF: supervisor read access in kernel mode > [ 220.174618] #PF: error_code(0x0000) - not-present page > [ 220.176516] PGD 13104d067 P4D 13104d067 PUD 13104f067 PMD 0 > [ 220.178559] Oops: 0000 [#1] PREEMPT SMP PTI > [ 220.180087] CPU: 3 PID: 35 Comm: kworker/u8:1 Not tainted 6.5.0-rc1-test-00021-gdd6e7af33766-dirty #15 > [ 220.183441] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > [ 220.186629] Workqueue: events_unbound eventfs_workfn > [ 220.188286] RIP: 0010:eventfs_set_ef_status_free+0x17/0x40 > [ 220.190091] Code: 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 48 8b 47 18 48 8b 40 30 48 83 f8 10 74 1b <f6> 40 f0 02 74 15 48 8b 47 78 48 85 c0 74 0c c6 40 5a 00 48 c7 40 > [ 220.195360] RSP: 0018:ffffa731c0147e20 EFLAGS: 00010287 > [ 220.196802] RAX: 0000000000000000 RBX: ffff97ca512ca000 RCX: 0000000000000000 > [ 220.198703] RDX: 0000000000000001 RSI: ffff97ca52d18010 RDI: ffff97ca512ca000 > [ 220.200540] RBP: ffff97ca52cb3780 R08: 0000000000000064 R09: 00000000802a0022 > [ 220.202324] R10: 0000000000039e80 R11: ffff97cabffd5000 R12: ffff97ca512ca058 > [ 220.204012] R13: ffff97ca52cb3780 R14: ffff97ca40153705 R15: ffffffffad5c1848 > [ 220.205685] FS: 0000000000000000(0000) GS:ffff97cabbd80000(0000) knlGS:0000000000000000 > [ 220.207476] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 220.208764] CR2: fffffffffffffff0 CR3: 000000010a01a001 CR4: 0000000000170ee0 > [ 220.210342] Call Trace: > [ 220.210879] <TASK> > [ 220.211359] ? __die+0x23/0x70 > [ 220.212036] ? page_fault_oops+0xa4/0x180 > [ 220.212904] ? exc_page_fault+0xf6/0x190 > [ 220.213738] ? asm_exc_page_fault+0x26/0x30 > [ 220.214586] ? eventfs_set_ef_status_free+0x17/0x40 > [ 220.216081] tracefs_dentry_iput+0x39/0x50 > [ 220.217370] __dentry_kill+0xdc/0x170 > [ 220.218581] dput+0x142/0x310 > [ 220.219647] eventfs_workfn+0x42/0x70 > [ 220.220805] process_one_work+0x1e2/0x3e0 > [ 220.222031] worker_thread+0x1da/0x390 > [ 220.223204] ? __pfx_worker_thread+0x10/0x10 > [ 220.224476] kthread+0xf7/0x130 > [ 220.225543] ? __pfx_kthread+0x10/0x10 > [ 220.226735] ret_from_fork+0x2c/0x50 > [ 220.227898] </TASK> > [ 220.228792] Modules linked in: > [ 220.229860] CR2: fffffffffffffff0 > [ 220.230960] ---[ end trace 0000000000000000 ]--- > > > I think I know the issue, and looking to see if I can fix it. - Is it also reproducible on v3? - Is it manually reproducible or reproducible using any specific script? Let me know if I can help. -Ajay
On Wed, 19 Jul 2023 10:25:28 +0000 Ajay Kaher <akaher@vmware.com> wrote: > - Is it also reproducible on v3? > - Is it manually reproducible or reproducible using any specific script? > > Let me know if I can help. Just tried it against v3, and it gave me the splat that I originally had and starting to fix, which now gives me another splat. I'll spend a couple more days on it and start sharing code and seeing if we can work together on this. Here's the reproducer (of both v3 splat and the bug I'm hitting now). ~# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events ~# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/ ~# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events v3 gives me (and my updates too) ====================================================== WARNING: possible circular locking dependency detected 6.5.0-rc1-test+ #576 Not tainted ------------------------------------------------------ trace-cmd/840 is trying to acquire lock: ffff8881007e5de0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}, at: dcache_dir_open_wrapper+0xc1/0x1b0 but task is already holding lock: ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (eventfs_rwsem/1){.+.+}-{3:3}: down_read_nested+0x41/0x180 eventfs_root_lookup+0x42/0x120 __lookup_slow+0xff/0x1b0 walk_component+0xdb/0x150 path_lookupat+0x67/0x1a0 filename_lookup+0xe4/0x1f0 vfs_statx+0x9e/0x180 vfs_fstatat+0x51/0x70 __do_sys_newfstatat+0x3f/0x80 do_syscall_64+0x3a/0xc0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 -> #0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}: __lock_acquire+0x165d/0x2390 lock_acquire+0xd4/0x2d0 down_write+0x3b/0xd0 dcache_dir_open_wrapper+0xc1/0x1b0 do_dentry_open+0x20c/0x510 path_openat+0x7ad/0xc60 do_filp_open+0xaf/0x160 do_sys_openat2+0xab/0xe0 __x64_sys_openat+0x6a/0xa0 do_syscall_64+0x3a/0xc0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- rlock(eventfs_rwsem/1); lock(&sb->s_type->i_mutex_key#5); lock(eventfs_rwsem/1); lock(&sb->s_type->i_mutex_key#5); *** DEADLOCK *** 1 lock held by trace-cmd/840: #0: ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0 stack backtrace: CPU: 7 PID: 840 Comm: trace-cmd Not tainted 6.5.0-rc1-test+ #576 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x57/0x90 check_noncircular+0x14b/0x160 __lock_acquire+0x165d/0x2390 lock_acquire+0xd4/0x2d0 ? dcache_dir_open_wrapper+0xc1/0x1b0 down_write+0x3b/0xd0 ? dcache_dir_open_wrapper+0xc1/0x1b0 dcache_dir_open_wrapper+0xc1/0x1b0 ? __pfx_dcache_dir_open_wrapper+0x10/0x10 do_dentry_open+0x20c/0x510 path_openat+0x7ad/0xc60 do_filp_open+0xaf/0x160 do_sys_openat2+0xab/0xe0 __x64_sys_openat+0x6a/0xa0 do_syscall_64+0x3a/0xc0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f1743267e41 Code: 44 24 18 31 c0 41 83 e2 40 75 3e 89 f0 25 00 00 41 00 3d 00 00 41 00 74 30 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 3f 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00 RSP: 002b:00007ffec10ff5d0 EFLAGS: 00000287 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1743267e41 RDX: 0000000000090800 RSI: 00007ffec10ffdb0 RDI: 00000000ffffff9c RBP: 00007ffec10ffda0 R08: 00007ffec11003e0 R09: 0000000000000040 R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffec11003e0 R13: 0000000000000040 R14: 0000000000000000 R15: 00007ffec110034b </TASK> I moved the code around a bit, and it appears that kprobes is getting dput() more than once. I moved the d_invalidate() and dput() into the workqueue function, and on kprobes going away, d_invalidate() frees it, and dput() is now corrupted. Still investigating. The VFS layer is a magic box that needs the right wizard hat to deal with, but I unfortunately am waiting on back order to retrieve that specific hat :-p -- Steve
> On 19-Jul-2023, at 7:53 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > > !! External Email > > On Wed, 19 Jul 2023 10:25:28 +0000 > Ajay Kaher <akaher@vmware.com> wrote: > >> - Is it also reproducible on v3? >> - Is it manually reproducible or reproducible using any specific script? >> >> Let me know if I can help. > > Just tried it against v3, and it gave me the splat that I originally had > and starting to fix, which now gives me another splat. I'll spend a couple > more days on it and start sharing code and seeing if we can work together > on this. > > Here's the reproducer (of both v3 splat and the bug I'm hitting now). > > ~# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events > ~# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/ > ~# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events I tried above steps on v4 but couldn’t reproduce: root@photon-6 [ ~/sdb/linux ]# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events root@photon-6 [ ~/sdb/linux ]# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/ enable filter format id trigger root@photon-6 [ ~/sdb/linux ]# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events -bash: echo: write error: No such file or directory I have doubt on call_srcu(), it may first end the grace period for parent then for child. If this is true then free_list will have unordered list and could cause problem. > > v3 gives me (and my updates too) > > ====================================================== > WARNING: possible circular locking dependency detected > 6.5.0-rc1-test+ #576 Not tainted > ------------------------------------------------------ > trace-cmd/840 is trying to acquire lock: > ffff8881007e5de0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}, at: dcache_dir_open_wrapper+0xc1/0x1b0 > > but task is already holding lock: > ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #1 (eventfs_rwsem/1){.+.+}-{3:3}: > down_read_nested+0x41/0x180 > eventfs_root_lookup+0x42/0x120 > __lookup_slow+0xff/0x1b0 > walk_component+0xdb/0x150 > path_lookupat+0x67/0x1a0 > filename_lookup+0xe4/0x1f0 > vfs_statx+0x9e/0x180 > vfs_fstatat+0x51/0x70 > __do_sys_newfstatat+0x3f/0x80 > do_syscall_64+0x3a/0xc0 > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > > -> #0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}: > __lock_acquire+0x165d/0x2390 > lock_acquire+0xd4/0x2d0 > down_write+0x3b/0xd0 > dcache_dir_open_wrapper+0xc1/0x1b0 > do_dentry_open+0x20c/0x510 > path_openat+0x7ad/0xc60 > do_filp_open+0xaf/0x160 > do_sys_openat2+0xab/0xe0 > __x64_sys_openat+0x6a/0xa0 > do_syscall_64+0x3a/0xc0 > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > > other info that might help us debug this: > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > rlock(eventfs_rwsem/1); > lock(&sb->s_type->i_mutex_key#5); > lock(eventfs_rwsem/1); > lock(&sb->s_type->i_mutex_key#5); > > *** DEADLOCK *** > > 1 lock held by trace-cmd/840: > #0: ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0 > > stack backtrace: > CPU: 7 PID: 840 Comm: trace-cmd Not tainted 6.5.0-rc1-test+ #576 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > Call Trace: > <TASK> > dump_stack_lvl+0x57/0x90 > check_noncircular+0x14b/0x160 > __lock_acquire+0x165d/0x2390 > lock_acquire+0xd4/0x2d0 > ? dcache_dir_open_wrapper+0xc1/0x1b0 > down_write+0x3b/0xd0 > ? dcache_dir_open_wrapper+0xc1/0x1b0 > dcache_dir_open_wrapper+0xc1/0x1b0 > ? __pfx_dcache_dir_open_wrapper+0x10/0x10 > do_dentry_open+0x20c/0x510 > path_openat+0x7ad/0xc60 > do_filp_open+0xaf/0x160 > do_sys_openat2+0xab/0xe0 > __x64_sys_openat+0x6a/0xa0 > do_syscall_64+0x3a/0xc0 > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > RIP: 0033:0x7f1743267e41 > Code: 44 24 18 31 c0 41 83 e2 40 75 3e 89 f0 25 00 00 41 00 3d 00 00 41 00 74 30 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 3f 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00 > RSP: 002b:00007ffec10ff5d0 EFLAGS: 00000287 ORIG_RAX: 0000000000000101 > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1743267e41 > RDX: 0000000000090800 RSI: 00007ffec10ffdb0 RDI: 00000000ffffff9c > RBP: 00007ffec10ffda0 R08: 00007ffec11003e0 R09: 0000000000000040 > R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffec11003e0 > R13: 0000000000000040 R14: 0000000000000000 R15: 00007ffec110034b > </TASK> > This is expected from v3 (just ignore as of now), if eventfs_set_ef_status_free crash not reproduced on v3 then it’s v4 issue. -Ajay > > I moved the code around a bit, and it appears that kprobes is getting > dput() more than once. > > I moved the d_invalidate() and dput() into the workqueue function, and on > kprobes going away, d_invalidate() frees it, and dput() is now corrupted. > > Still investigating. The VFS layer is a magic box that needs the right > wizard hat to deal with, but I unfortunately am waiting on back order to > retrieve that specific hat :-p > > -- Steve > > !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.
On Wed, 19 Jul 2023 18:37:12 +0000 Ajay Kaher <akaher@vmware.com> wrote: > > Here's the reproducer (of both v3 splat and the bug I'm hitting now). > > > > ~# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events > > ~# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/ > > ~# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events > > I tried above steps on v4 but couldn’t reproduce: > > root@photon-6 [ ~/sdb/linux ]# echo 'p:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events > root@photon-6 [ ~/sdb/linux ]# ls /sys/kernel/debug/tracing/events/kprobes/sock_getattr/ > enable filter format id trigger > root@photon-6 [ ~/sdb/linux ]# echo '-:sock_getattr 0xffffffff9b55cef0 sk=%di' > /sys/kernel/tracing/kprobe_events > -bash: echo: write error: No such file or directory > > I have doubt on call_srcu(), it may first end the grace period for parent then for child. If this is true then free_list > will have unordered list and could cause problem. I modified the srcu portion a bit. Will post soon. I think I got something working. I'm having doubt that the dput()s were needed in the eventfs_remove_rec(), as the d_invalidate() appears to be enough. I'm still testing. > > > > > > v3 gives me (and my updates too) > > > > ====================================================== > > WARNING: possible circular locking dependency detected > > 6.5.0-rc1-test+ #576 Not tainted > > ------------------------------------------------------ > > trace-cmd/840 is trying to acquire lock: > > ffff8881007e5de0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}, at: dcache_dir_open_wrapper+0xc1/0x1b0 > > > > but task is already holding lock: > > ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0 > > > > which lock already depends on the new lock. > > > > > > the existing dependency chain (in reverse order) is: > > > > -> #1 (eventfs_rwsem/1){.+.+}-{3:3}: > > down_read_nested+0x41/0x180 > > eventfs_root_lookup+0x42/0x120 > > __lookup_slow+0xff/0x1b0 > > walk_component+0xdb/0x150 > > path_lookupat+0x67/0x1a0 > > filename_lookup+0xe4/0x1f0 > > vfs_statx+0x9e/0x180 > > vfs_fstatat+0x51/0x70 > > __do_sys_newfstatat+0x3f/0x80 > > do_syscall_64+0x3a/0xc0 > > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > > > > -> #0 (&sb->s_type->i_mutex_key#5){++++}-{3:3}: > > __lock_acquire+0x165d/0x2390 > > lock_acquire+0xd4/0x2d0 > > down_write+0x3b/0xd0 > > dcache_dir_open_wrapper+0xc1/0x1b0 > > do_dentry_open+0x20c/0x510 > > path_openat+0x7ad/0xc60 > > do_filp_open+0xaf/0x160 > > do_sys_openat2+0xab/0xe0 > > __x64_sys_openat+0x6a/0xa0 > > do_syscall_64+0x3a/0xc0 > > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > > > > other info that might help us debug this: > > > > Possible unsafe locking scenario: > > > > CPU0 CPU1 > > ---- ---- > > rlock(eventfs_rwsem/1); > > lock(&sb->s_type->i_mutex_key#5); > > lock(eventfs_rwsem/1); > > lock(&sb->s_type->i_mutex_key#5); > > > > *** DEADLOCK *** > > > > 1 lock held by trace-cmd/840: > > #0: ffff888103ad7e70 (eventfs_rwsem/1){.+.+}-{3:3}, at: dcache_dir_open_wrapper+0x6f/0x1b0 > > > > stack backtrace: > > CPU: 7 PID: 840 Comm: trace-cmd Not tainted 6.5.0-rc1-test+ #576 > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > > Call Trace: > > <TASK> > > dump_stack_lvl+0x57/0x90 > > check_noncircular+0x14b/0x160 > > __lock_acquire+0x165d/0x2390 > > lock_acquire+0xd4/0x2d0 > > ? dcache_dir_open_wrapper+0xc1/0x1b0 > > down_write+0x3b/0xd0 > > ? dcache_dir_open_wrapper+0xc1/0x1b0 > > dcache_dir_open_wrapper+0xc1/0x1b0 > > ? __pfx_dcache_dir_open_wrapper+0x10/0x10 > > do_dentry_open+0x20c/0x510 > > path_openat+0x7ad/0xc60 > > do_filp_open+0xaf/0x160 > > do_sys_openat2+0xab/0xe0 > > __x64_sys_openat+0x6a/0xa0 > > do_syscall_64+0x3a/0xc0 > > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > > RIP: 0033:0x7f1743267e41 > > Code: 44 24 18 31 c0 41 83 e2 40 75 3e 89 f0 25 00 00 41 00 3d 00 00 41 00 74 30 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 3f 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00 > > RSP: 002b:00007ffec10ff5d0 EFLAGS: 00000287 ORIG_RAX: 0000000000000101 > > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1743267e41 > > RDX: 0000000000090800 RSI: 00007ffec10ffdb0 RDI: 00000000ffffff9c > > RBP: 00007ffec10ffda0 R08: 00007ffec11003e0 R09: 0000000000000040 > > R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffec11003e0 > > R13: 0000000000000040 R14: 0000000000000000 R15: 00007ffec110034b > > </TASK> > > > > This is expected from v3 (just ignore as of now), if eventfs_set_ef_status_free crash not > reproduced on v3 then it’s v4 issue. The issue comes from fixing the above ;-) -- Steve
[ Resending because claws-mail is messing with the Cc again. It doesn't like quotes :-p ] On Wed, 19 Jul 2023 14:40:46 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > > I have doubt on call_srcu(), it may first end the grace period for parent then for child. If this is true then free_list > > will have unordered list and could cause problem. > > I modified the srcu portion a bit. Will post soon. I think I got something > working. > > I'm having doubt that the dput()s were needed in the eventfs_remove_rec(), > as the d_invalidate() appears to be enough. I'm still testing. OK, I got this working and it appears to pass all my tests. I actually got it working Wednesday night, but I tried a different approach on Thursday that got rid of the evenfs_file and only used eventfs_inodes and makes the files more dynamic. There's still a couple of corner cases that are not working with this approach (the dentry counters are getting out of sync). This should not stop this from going in. I'll continue working on that approach for the next merge cycle. But for now, here's the patch to this series that works. I'm going to post it here, and then reply to it with comments about my changes. -- Steve diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 4db048250cdb..2718de1533e6 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -36,16 +36,36 @@ struct eventfs_file { const struct file_operations *fop; const struct inode_operations *iop; union { + struct list_head del_list; struct rcu_head rcu; - struct llist_node llist; /* For freeing after RCU */ + unsigned long is_freed; /* Freed if one of the above is set */ }; void *data; umode_t mode; - bool created; + unsigned int flags; }; static DEFINE_MUTEX(eventfs_mutex); DEFINE_STATIC_SRCU(eventfs_srcu); + +static struct dentry *eventfs_root_lookup(struct inode *dir, + struct dentry *dentry, + unsigned int flags); +static int dcache_dir_open_wrapper(struct inode *inode, struct file *file); +static int eventfs_release(struct inode *inode, struct file *file); + +static const struct inode_operations eventfs_root_dir_inode_operations = { + .lookup = eventfs_root_lookup, +}; + +static const struct file_operations eventfs_file_operations = { + .open = dcache_dir_open_wrapper, + .read = generic_read_dir, + .iterate_shared = dcache_readdir, + .llseek = generic_file_llseek, + .release = eventfs_release, +}; + /** * create_file - create a file in the tracefs filesystem * @name: the name of the file to create. @@ -123,17 +143,12 @@ static struct dentry *create_file(const char *name, umode_t mode, * If tracefs is not enabled in the kernel, the value -%ENODEV will be * returned. */ -static struct dentry *create_dir(const char *name, umode_t mode, - struct dentry *parent, void *data, - const struct file_operations *fop, - const struct inode_operations *iop) +static struct dentry *create_dir(const char *name, struct dentry *parent, void *data) { struct tracefs_inode *ti; struct dentry *dentry; struct inode *inode; - WARN_ON(!S_ISDIR(mode)); - dentry = eventfs_start_creating(name, parent); if (IS_ERR(dentry)) return dentry; @@ -142,9 +157,9 @@ static struct dentry *create_dir(const char *name, umode_t mode, if (unlikely(!inode)) return eventfs_failed_creating(dentry); - inode->i_mode = mode; - inode->i_op = iop; - inode->i_fop = fop; + inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO; + inode->i_op = &eventfs_root_dir_inode_operations; + inode->i_fop = &eventfs_file_operations; inode->i_private = data; ti = get_tracefs(inode); @@ -169,15 +184,27 @@ void eventfs_set_ef_status_free(struct dentry *dentry) struct tracefs_inode *ti_parent; struct eventfs_file *ef; + mutex_lock(&eventfs_mutex); ti_parent = get_tracefs(dentry->d_parent->d_inode); if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE)) - return; + goto out; ef = dentry->d_fsdata; if (!ef) - return; - ef->created = false; + goto out; + /* + * If ef was freed, then the LSB bit is set for d_fsdata. + * But this should not happen, as it should still have a + * ref count that prevents it. Warn in case it does. + */ + if (WARN_ON_ONCE((unsigned long)ef & 1)) + goto out; + + dentry->d_fsdata = NULL; + ef->dentry = NULL; + out: + mutex_unlock(&eventfs_mutex); } /** @@ -202,6 +229,79 @@ static void eventfs_post_create_dir(struct eventfs_file *ef) ti->private = ef->ei; } +static struct dentry * +create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup) +{ + bool invalidate = false; + struct dentry *dentry; + + mutex_lock(&eventfs_mutex); + if (ef->is_freed) { + mutex_unlock(&eventfs_mutex); + return NULL; + } + if (ef->dentry) { + dentry = ef->dentry; + /* On dir open, up the ref count */ + if (!lookup) + dget(dentry); + mutex_unlock(&eventfs_mutex); + return dentry; + } + mutex_unlock(&eventfs_mutex); + + if (!lookup) + inode_lock(parent->d_inode); + + if (ef->ei) + dentry = create_dir(ef->name, parent, ef->data); + else + dentry = create_file(ef->name, ef->mode, parent, + ef->data, ef->fop); + + if (!lookup) + inode_unlock(parent->d_inode); + + mutex_lock(&eventfs_mutex); + if (IS_ERR_OR_NULL(dentry)) { + /* If the ef was already updated get it */ + dentry = ef->dentry; + if (dentry && !lookup) + dget(dentry); + mutex_unlock(&eventfs_mutex); + return dentry; + } + + if (!ef->dentry && !ef->is_freed) { + ef->dentry = dentry; + if (ef->ei) + eventfs_post_create_dir(ef); + dentry->d_fsdata = ef; + } else { + /* A race here, should try again (unless freed) */ + invalidate = true; + } + mutex_unlock(&eventfs_mutex); + if (invalidate) + d_invalidate(dentry); + + if (lookup || invalidate) + dput(dentry); + + return invalidate ? NULL : dentry; +} + +static bool match_event_file(struct eventfs_file *ef, const char *name) +{ + bool ret; + + mutex_lock(&eventfs_mutex); + ret = !ef->is_freed && strcmp(ef->name, name) == 0; + mutex_unlock(&eventfs_mutex); + + return ret; +} + /** * eventfs_root_lookup - lookup routine to create file/dir * @dir: directory in which lookup to be done @@ -211,7 +311,6 @@ static void eventfs_post_create_dir(struct eventfs_file *ef) * Used to create dynamic file/dir with-in @dir, search with-in ei * list, if @dentry found go ahead and create the file/dir */ - static struct dentry *eventfs_root_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags) @@ -230,30 +329,10 @@ static struct dentry *eventfs_root_lookup(struct inode *dir, idx = srcu_read_lock(&eventfs_srcu); list_for_each_entry_srcu(ef, &ei->e_top_files, list, srcu_read_lock_held(&eventfs_srcu)) { - if (strcmp(ef->name, dentry->d_name.name)) + if (!match_event_file(ef, dentry->d_name.name)) continue; ret = simple_lookup(dir, dentry, flags); - if (ef->created) - continue; - mutex_lock(&eventfs_mutex); - ef->created = true; - if (ef->ei) - ef->dentry = create_dir(ef->name, ef->mode, ef->d_parent, - ef->data, ef->fop, ef->iop); - else - ef->dentry = create_file(ef->name, ef->mode, ef->d_parent, - ef->data, ef->fop); - - if (IS_ERR_OR_NULL(ef->dentry)) { - ef->created = false; - mutex_unlock(&eventfs_mutex); - } else { - if (ef->ei) - eventfs_post_create_dir(ef); - ef->dentry->d_fsdata = ef; - mutex_unlock(&eventfs_mutex); - dput(ef->dentry); - } + create_dentry(ef, ef->d_parent, true); break; } srcu_read_unlock(&eventfs_srcu, idx); @@ -270,6 +349,7 @@ static int eventfs_release(struct inode *inode, struct file *file) struct tracefs_inode *ti; struct eventfs_inode *ei; struct eventfs_file *ef; + struct dentry *dentry; int idx; ti = get_tracefs(inode); @@ -280,8 +360,11 @@ static int eventfs_release(struct inode *inode, struct file *file) idx = srcu_read_lock(&eventfs_srcu); list_for_each_entry_srcu(ef, &ei->e_top_files, list, srcu_read_lock_held(&eventfs_srcu)) { - if (ef->created) - dput(ef->dentry); + mutex_lock(&eventfs_mutex); + dentry = ef->dentry; + mutex_unlock(&eventfs_mutex); + if (dentry) + dput(dentry); } srcu_read_unlock(&eventfs_srcu, idx); return dcache_dir_close(inode, file); @@ -312,47 +395,12 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) ei = ti->private; idx = srcu_read_lock(&eventfs_srcu); list_for_each_entry_rcu(ef, &ei->e_top_files, list) { - if (ef->created) { - dget(ef->dentry); - continue; - } - mutex_lock(&eventfs_mutex); - ef->created = true; - - inode_lock(dentry->d_inode); - if (ef->ei) - ef->dentry = create_dir(ef->name, ef->mode, dentry, - ef->data, ef->fop, ef->iop); - else - ef->dentry = create_file(ef->name, ef->mode, dentry, - ef->data, ef->fop); - inode_unlock(dentry->d_inode); - - if (IS_ERR_OR_NULL(ef->dentry)) { - ef->created = false; - } else { - if (ef->ei) - eventfs_post_create_dir(ef); - ef->dentry->d_fsdata = ef; - } - mutex_unlock(&eventfs_mutex); + create_dentry(ef, dentry, false); } srcu_read_unlock(&eventfs_srcu, idx); return dcache_dir_open(inode, file); } -static const struct file_operations eventfs_file_operations = { - .open = dcache_dir_open_wrapper, - .read = generic_read_dir, - .iterate_shared = dcache_readdir, - .llseek = generic_file_llseek, - .release = eventfs_release, -}; - -static const struct inode_operations eventfs_root_dir_inode_operations = { - .lookup = eventfs_root_lookup, -}; - /** * eventfs_prepare_ef - helper function to prepare eventfs_file * @name: the name of the file/directory to create. @@ -470,11 +518,7 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name, ti_parent = get_tracefs(parent->d_inode); ei_parent = ti_parent->private; - ef = eventfs_prepare_ef(name, - S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO, - &eventfs_file_operations, - &eventfs_root_dir_inode_operations, NULL); - + ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL); if (IS_ERR(ef)) return ef; @@ -502,11 +546,7 @@ struct eventfs_file *eventfs_add_dir(const char *name, if (!ef_parent) return ERR_PTR(-EINVAL); - ef = eventfs_prepare_ef(name, - S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO, - &eventfs_file_operations, - &eventfs_root_dir_inode_operations, NULL); - + ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL); if (IS_ERR(ef)) return ef; @@ -601,37 +641,15 @@ int eventfs_add_file(const char *name, umode_t mode, return 0; } -static LLIST_HEAD(free_list); - -static void eventfs_workfn(struct work_struct *work) -{ - struct eventfs_file *ef, *tmp; - struct llist_node *llnode; - - llnode = llist_del_all(&free_list); - llist_for_each_entry_safe(ef, tmp, llnode, llist) { - if (ef->created && ef->dentry) - dput(ef->dentry); - kfree(ef->name); - kfree(ef->ei); - kfree(ef); - } -} - -DECLARE_WORK(eventfs_work, eventfs_workfn); - static void free_ef(struct rcu_head *head) { struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu); - if (!llist_add(&ef->llist, &free_list)) - return; - - queue_work(system_unbound_wq, &eventfs_work); + kfree(ef->name); + kfree(ef->ei); + kfree(ef); } - - /** * eventfs_remove_rec - remove eventfs dir or file from list * @ef: eventfs_file to be removed. @@ -639,7 +657,7 @@ static void free_ef(struct rcu_head *head) * This function recursively remove eventfs_file which * contains info of file or dir. */ -static void eventfs_remove_rec(struct eventfs_file *ef, int level) +static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, int level) { struct eventfs_file *ef_child; @@ -659,15 +677,12 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level) /* search for nested folders or files */ list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list, lockdep_is_held(&eventfs_mutex)) { - eventfs_remove_rec(ef_child, level + 1); + eventfs_remove_rec(ef_child, head, level + 1); } } - if (ef->created && ef->dentry) - d_invalidate(ef->dentry); - list_del_rcu(&ef->list); - call_srcu(&eventfs_srcu, &ef->rcu, free_ef); + list_add_tail(&ef->del_list, head); } /** @@ -678,12 +693,62 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level) */ void eventfs_remove(struct eventfs_file *ef) { + struct eventfs_file *tmp; + LIST_HEAD(ef_del_list); + struct dentry *dentry_list = NULL; + struct dentry *dentry; + if (!ef) return; mutex_lock(&eventfs_mutex); - eventfs_remove_rec(ef, 0); + eventfs_remove_rec(ef, &ef_del_list, 0); + + list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) { + if (ef->dentry) { + unsigned long ptr = (unsigned long)dentry_list; + + /* Keep the dentry from being freed yet */ + dget(ef->dentry); + + /* + * Paranoid: The dget() above should prevent the dentry + * from being freed and calling eventfs_set_ef_status_free(). + * But just in case, set the link list LSB pointer to 1 + * and have eventfs_set_ef_status_free() check that to + * make sure that if it does happen, it will not think + * the d_fsdata is an event_file. + * + * For this to work, no event_file should be allocated + * on a odd space, as the ef should always be allocated + * to be at least word aligned. Check for that too. + */ + WARN_ON_ONCE(ptr & 1); + + ef->dentry->d_fsdata = (void *)(ptr | 1); + dentry_list = ef->dentry; + ef->dentry = NULL; + } + call_srcu(&eventfs_srcu, &ef->rcu, free_ef); + } mutex_unlock(&eventfs_mutex); + + while (dentry_list) { + unsigned long ptr; + + dentry = dentry_list; + ptr = (unsigned long)dentry->d_fsdata & ~1UL; + dentry_list = (struct dentry *)ptr; + dentry->d_fsdata = NULL; + d_invalidate(dentry); + mutex_lock(&eventfs_mutex); + /* dentry should now have at least a single reference */ + WARN_ONCE((int)d_count(dentry) < 1, + "dentry %px less than one reference (%d) after invalidate\n", + dentry, d_count(dentry)); + mutex_unlock(&eventfs_mutex); + dput(dentry); + } } /** diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index c443a0c32a8c..1b880b5cd29d 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -22,4 +22,6 @@ struct dentry *tracefs_end_creating(struct dentry *dentry); struct dentry *tracefs_failed_creating(struct dentry *dentry); struct inode *tracefs_get_inode(struct super_block *sb); +void eventfs_set_ef_status_free(struct dentry *dentry); + #endif /* _TRACEFS_INTERNAL_H */ diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 4d30b0cafc5f..47c1b4d21735 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -51,8 +51,6 @@ void eventfs_remove(struct eventfs_file *ef); void eventfs_remove_events_dir(struct dentry *dentry); -void eventfs_set_ef_status_free(struct dentry *dentry); - struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops);
[ Resending because claws-mail is messing with the Cc again. It doesn't like quotes :-p ] On Fri, 21 Jul 2023 08:48:39 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c > index 4db048250cdb..2718de1533e6 100644 > --- a/fs/tracefs/event_inode.c > +++ b/fs/tracefs/event_inode.c > @@ -36,16 +36,36 @@ struct eventfs_file { > const struct file_operations *fop; > const struct inode_operations *iop; > union { > + struct list_head del_list; > struct rcu_head rcu; > - struct llist_node llist; /* For freeing after RCU */ > + unsigned long is_freed; /* Freed if one of the above is set */ I changed the freeing around. The dentries are freed before returning from eventfs_remove_dir(). I also added a "is_freed" field that is part of the union and is set if list elements have content. Note, since the union was criticized before, I will state the entire purpose of doing this patch set is to save memory. This structure will be used for every event file. What's the point of getting rid of dentries if we are replacing it with something just as big? Anyway, struct dentry does the exact same thing! > }; > void *data; > umode_t mode; > - bool created; > + unsigned int flags; Bah, I forgot to remove flags (one iteration replaced the created with flags to set both created and freed). I removed the freed with the above "is_freed" and noticed that created is set if and only if ef->dentry is set. So instead of using the created boolean, just test ef->dentry. The flags isn't used and can be removed. I just forgot to do so. > }; > > static DEFINE_MUTEX(eventfs_mutex); > DEFINE_STATIC_SRCU(eventfs_srcu); > + > +static struct dentry *eventfs_root_lookup(struct inode *dir, > + struct dentry *dentry, > + unsigned int flags); > +static int dcache_dir_open_wrapper(struct inode *inode, struct file *file); > +static int eventfs_release(struct inode *inode, struct file *file); > + > +static const struct inode_operations eventfs_root_dir_inode_operations = { > + .lookup = eventfs_root_lookup, > +}; > + > +static const struct file_operations eventfs_file_operations = { > + .open = dcache_dir_open_wrapper, > + .read = generic_read_dir, > + .iterate_shared = dcache_readdir, > + .llseek = generic_file_llseek, > + .release = eventfs_release, > +}; > + In preparing for getting rid of eventfs_file, I noticed that all directories are set to the above ops. In create_dir() instead of passing in ef->*ops, just use these directly. This does help with future work. > /** > * create_file - create a file in the tracefs filesystem > * @name: the name of the file to create. > @@ -123,17 +143,12 @@ static struct dentry *create_file(const char *name, umode_t mode, > * If tracefs is not enabled in the kernel, the value -%ENODEV will be > * returned. > */ > -static struct dentry *create_dir(const char *name, umode_t mode, > - struct dentry *parent, void *data, > - const struct file_operations *fop, > - const struct inode_operations *iop) > +static struct dentry *create_dir(const char *name, struct dentry *parent, void *data) > { As stated, the directories always used the same *op values, so I just hard coded it. > struct tracefs_inode *ti; > struct dentry *dentry; > struct inode *inode; > > - WARN_ON(!S_ISDIR(mode)); > - > dentry = eventfs_start_creating(name, parent); > if (IS_ERR(dentry)) > return dentry; > @@ -142,9 +157,9 @@ static struct dentry *create_dir(const char *name, umode_t mode, > if (unlikely(!inode)) > return eventfs_failed_creating(dentry); > > - inode->i_mode = mode; > - inode->i_op = iop; > - inode->i_fop = fop; > + inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO; > + inode->i_op = &eventfs_root_dir_inode_operations; > + inode->i_fop = &eventfs_file_operations; > inode->i_private = data; > > ti = get_tracefs(inode); > @@ -169,15 +184,27 @@ void eventfs_set_ef_status_free(struct dentry *dentry) > struct tracefs_inode *ti_parent; > struct eventfs_file *ef; > > + mutex_lock(&eventfs_mutex); To synchronize with the removals, I needed to add locking here. > ti_parent = get_tracefs(dentry->d_parent->d_inode); > if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE)) > - return; > + goto out; > > ef = dentry->d_fsdata; > if (!ef) > - return; > - ef->created = false; > + goto out; > + /* > + * If ef was freed, then the LSB bit is set for d_fsdata. > + * But this should not happen, as it should still have a > + * ref count that prevents it. Warn in case it does. > + */ > + if (WARN_ON_ONCE((unsigned long)ef & 1)) > + goto out; During the remove, a dget() is done to keep the dentry from freeing. To make sure that it doesn't get freed, I added this test. > + > + dentry->d_fsdata = NULL; > + > ef->dentry = NULL; > + out: > + mutex_unlock(&eventfs_mutex); > } > > /** > @@ -202,6 +229,79 @@ static void eventfs_post_create_dir(struct eventfs_file *ef) > ti->private = ef->ei; > } > > +static struct dentry * > +create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup) > +{ Because both the lookup and the dir_open_wrapper did basically the same thing, I created a helper function so that I didn't have to update both locations. > + bool invalidate = false; > + struct dentry *dentry; > + > + mutex_lock(&eventfs_mutex); > + if (ef->is_freed) { > + mutex_unlock(&eventfs_mutex); > + return NULL; > + } Ignore if the ef is on its way to be freed. > + if (ef->dentry) { > + dentry = ef->dentry; If the ef already has a dentry (created) then use it. > + /* On dir open, up the ref count */ > + if (!lookup) > + dget(dentry); > + mutex_unlock(&eventfs_mutex); > + return dentry; > + } > + mutex_unlock(&eventfs_mutex); > + > + if (!lookup) > + inode_lock(parent->d_inode); > + > + if (ef->ei) > + dentry = create_dir(ef->name, parent, ef->data); > + else > + dentry = create_file(ef->name, ef->mode, parent, > + ef->data, ef->fop); > + > + if (!lookup) > + inode_unlock(parent->d_inode); > + > + mutex_lock(&eventfs_mutex); > + if (IS_ERR_OR_NULL(dentry)) { With the lock dropped, the dentry could have been created causing it to fail. Check if the ef->dentry exists, and if so, use it instead. Note, if the ef is freed, it should not have a dentry. > + /* If the ef was already updated get it */ > + dentry = ef->dentry; > + if (dentry && !lookup) > + dget(dentry); > + mutex_unlock(&eventfs_mutex); > + return dentry; > + } > + > + if (!ef->dentry && !ef->is_freed) { With the lock dropped, the dentry could have been filled too. If so, drop the created dentry and use the one owned by the ef->dentry. > + ef->dentry = dentry; > + if (ef->ei) > + eventfs_post_create_dir(ef); > + dentry->d_fsdata = ef; > + } else { > + /* A race here, should try again (unless freed) */ > + invalidate = true; I had a WARN_ON() once here. Probably could add a: WARN_ON_ONCE(!ef->is_freed); > + } > + mutex_unlock(&eventfs_mutex); > + if (invalidate) > + d_invalidate(dentry); > + > + if (lookup || invalidate) > + dput(dentry); > + > + return invalidate ? NULL : dentry; > +} > + > +static bool match_event_file(struct eventfs_file *ef, const char *name) > +{ A bit of a paranoid helper function. I wanted to make sure to synchronize with the removals. > + bool ret; > + > + mutex_lock(&eventfs_mutex); > + ret = !ef->is_freed && strcmp(ef->name, name) == 0; > + mutex_unlock(&eventfs_mutex); > + > + return ret; > +} > + > /** > * eventfs_root_lookup - lookup routine to create file/dir > * @dir: directory in which lookup to be done > @@ -211,7 +311,6 @@ static void eventfs_post_create_dir(struct eventfs_file *ef) > * Used to create dynamic file/dir with-in @dir, search with-in ei > * list, if @dentry found go ahead and create the file/dir > */ > - > static struct dentry *eventfs_root_lookup(struct inode *dir, > struct dentry *dentry, > unsigned int flags) > @@ -230,30 +329,10 @@ static struct dentry *eventfs_root_lookup(struct inode *dir, > idx = srcu_read_lock(&eventfs_srcu); > list_for_each_entry_srcu(ef, &ei->e_top_files, list, > srcu_read_lock_held(&eventfs_srcu)) { > - if (strcmp(ef->name, dentry->d_name.name)) > + if (!match_event_file(ef, dentry->d_name.name)) > continue; > ret = simple_lookup(dir, dentry, flags); > - if (ef->created) > - continue; > - mutex_lock(&eventfs_mutex); > - ef->created = true; > - if (ef->ei) > - ef->dentry = create_dir(ef->name, ef->mode, ef->d_parent, > - ef->data, ef->fop, ef->iop); > - else > - ef->dentry = create_file(ef->name, ef->mode, ef->d_parent, > - ef->data, ef->fop); > - > - if (IS_ERR_OR_NULL(ef->dentry)) { > - ef->created = false; > - mutex_unlock(&eventfs_mutex); > - } else { > - if (ef->ei) > - eventfs_post_create_dir(ef); > - ef->dentry->d_fsdata = ef; > - mutex_unlock(&eventfs_mutex); > - dput(ef->dentry); > - } > + create_dentry(ef, ef->d_parent, true); > break; > } > srcu_read_unlock(&eventfs_srcu, idx); > @@ -270,6 +349,7 @@ static int eventfs_release(struct inode *inode, struct file *file) > struct tracefs_inode *ti; > struct eventfs_inode *ei; > struct eventfs_file *ef; > + struct dentry *dentry; > int idx; > > ti = get_tracefs(inode); > @@ -280,8 +360,11 @@ static int eventfs_release(struct inode *inode, struct file *file) > idx = srcu_read_lock(&eventfs_srcu); > list_for_each_entry_srcu(ef, &ei->e_top_files, list, > srcu_read_lock_held(&eventfs_srcu)) { > - if (ef->created) > - dput(ef->dentry); > + mutex_lock(&eventfs_mutex); > + dentry = ef->dentry; > + mutex_unlock(&eventfs_mutex); > + if (dentry) > + dput(dentry); > } > srcu_read_unlock(&eventfs_srcu, idx); > return dcache_dir_close(inode, file); > @@ -312,47 +395,12 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) > ei = ti->private; > idx = srcu_read_lock(&eventfs_srcu); > list_for_each_entry_rcu(ef, &ei->e_top_files, list) { > - if (ef->created) { > - dget(ef->dentry); > - continue; > - } > - mutex_lock(&eventfs_mutex); > - ef->created = true; > - > - inode_lock(dentry->d_inode); > - if (ef->ei) > - ef->dentry = create_dir(ef->name, ef->mode, dentry, > - ef->data, ef->fop, ef->iop); > - else > - ef->dentry = create_file(ef->name, ef->mode, dentry, > - ef->data, ef->fop); > - inode_unlock(dentry->d_inode); > - > - if (IS_ERR_OR_NULL(ef->dentry)) { > - ef->created = false; > - } else { > - if (ef->ei) > - eventfs_post_create_dir(ef); > - ef->dentry->d_fsdata = ef; > - } > - mutex_unlock(&eventfs_mutex); > + create_dentry(ef, dentry, false); > } > srcu_read_unlock(&eventfs_srcu, idx); > return dcache_dir_open(inode, file); > } > > -static const struct file_operations eventfs_file_operations = { > - .open = dcache_dir_open_wrapper, > - .read = generic_read_dir, > - .iterate_shared = dcache_readdir, > - .llseek = generic_file_llseek, > - .release = eventfs_release, > -}; > - > -static const struct inode_operations eventfs_root_dir_inode_operations = { > - .lookup = eventfs_root_lookup, > -}; > - > /** > * eventfs_prepare_ef - helper function to prepare eventfs_file > * @name: the name of the file/directory to create. > @@ -470,11 +518,7 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name, > ti_parent = get_tracefs(parent->d_inode); > ei_parent = ti_parent->private; > > - ef = eventfs_prepare_ef(name, > - S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO, > - &eventfs_file_operations, > - &eventfs_root_dir_inode_operations, NULL); > - > + ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL); For directories, just use the hard coded values. > if (IS_ERR(ef)) > return ef; > > @@ -502,11 +546,7 @@ struct eventfs_file *eventfs_add_dir(const char *name, > if (!ef_parent) > return ERR_PTR(-EINVAL); > > - ef = eventfs_prepare_ef(name, > - S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO, > - &eventfs_file_operations, > - &eventfs_root_dir_inode_operations, NULL); > - > + ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL); ditto. > if (IS_ERR(ef)) > return ef; > > @@ -601,37 +641,15 @@ int eventfs_add_file(const char *name, umode_t mode, > return 0; > } > > -static LLIST_HEAD(free_list); > - > -static void eventfs_workfn(struct work_struct *work) > -{ > - struct eventfs_file *ef, *tmp; > - struct llist_node *llnode; > - > - llnode = llist_del_all(&free_list); > - llist_for_each_entry_safe(ef, tmp, llnode, llist) { > - if (ef->created && ef->dentry) > - dput(ef->dentry); > - kfree(ef->name); > - kfree(ef->ei); > - kfree(ef); > - } > -} > - > -DECLARE_WORK(eventfs_work, eventfs_workfn); > - > static void free_ef(struct rcu_head *head) > { > struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu); > > - if (!llist_add(&ef->llist, &free_list)) > - return; > - > - queue_work(system_unbound_wq, &eventfs_work); > + kfree(ef->name); > + kfree(ef->ei); > + kfree(ef); Since I did not do the dput() or d_invalidate() here I don't need call this from task context. This simplifies the process. > } > > - > - > /** > * eventfs_remove_rec - remove eventfs dir or file from list > * @ef: eventfs_file to be removed. > @@ -639,7 +657,7 @@ static void free_ef(struct rcu_head *head) > * This function recursively remove eventfs_file which > * contains info of file or dir. > */ > -static void eventfs_remove_rec(struct eventfs_file *ef, int level) > +static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, int level) > { > struct eventfs_file *ef_child; > > @@ -659,15 +677,12 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level) > /* search for nested folders or files */ > list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list, > lockdep_is_held(&eventfs_mutex)) { > - eventfs_remove_rec(ef_child, level + 1); > + eventfs_remove_rec(ef_child, head, level + 1); > } > } > > - if (ef->created && ef->dentry) > - d_invalidate(ef->dentry); > - > list_del_rcu(&ef->list); > - call_srcu(&eventfs_srcu, &ef->rcu, free_ef); > + list_add_tail(&ef->del_list, head); Hold off on freeing the ef. Add it to a link list to do so later. > } > > /** > @@ -678,12 +693,62 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level) > */ > void eventfs_remove(struct eventfs_file *ef) > { > + struct eventfs_file *tmp; > + LIST_HEAD(ef_del_list); > + struct dentry *dentry_list = NULL; > + struct dentry *dentry; > + > if (!ef) > return; > > mutex_lock(&eventfs_mutex); > - eventfs_remove_rec(ef, 0); > + eventfs_remove_rec(ef, &ef_del_list, 0); The above returns back with ef_del_list holding all the ef's to be freed. I probably could have just passed the dentry_list down instead, but I wanted the below complexity done in a non recursive function. > + > + list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) { > + if (ef->dentry) { > + unsigned long ptr = (unsigned long)dentry_list; > + > + /* Keep the dentry from being freed yet */ > + dget(ef->dentry); > + > + /* > + * Paranoid: The dget() above should prevent the dentry > + * from being freed and calling eventfs_set_ef_status_free(). > + * But just in case, set the link list LSB pointer to 1 > + * and have eventfs_set_ef_status_free() check that to > + * make sure that if it does happen, it will not think > + * the d_fsdata is an event_file. > + * > + * For this to work, no event_file should be allocated > + * on a odd space, as the ef should always be allocated > + * to be at least word aligned. Check for that too. > + */ > + WARN_ON_ONCE(ptr & 1); > + > + ef->dentry->d_fsdata = (void *)(ptr | 1); Set the d_fsdata to be a link list. The comment above needs to say to say struct eventfs_file and struct dentry should be word aligned. Anyway, while the eventfs_mutex is held, set all the dentries belonging to eventfs_files to the dentry_list and clear the ef->dentry. > + dentry_list = ef->dentry; > + ef->dentry = NULL; > + } > + call_srcu(&eventfs_srcu, &ef->rcu, free_ef); > + } > mutex_unlock(&eventfs_mutex); > + > + while (dentry_list) { > + unsigned long ptr; > + > + dentry = dentry_list; > + ptr = (unsigned long)dentry->d_fsdata & ~1UL; > + dentry_list = (struct dentry *)ptr; > + dentry->d_fsdata = NULL; With the mutex released, it is safe to free the dentries here. This also must be done before returning from this function, as when I had it done in the workqueue, it was failing some tests that would remove a dynamic event and still see that the directory was still around! > + d_invalidate(dentry); > + mutex_lock(&eventfs_mutex); > + /* dentry should now have at least a single reference */ > + WARN_ONCE((int)d_count(dentry) < 1, > + "dentry %px less than one reference (%d) after invalidate\n", I did update the above to: WARN_ONCE((int)d_count(dentry) < 1, "dentry %px (%s) less than one reference (%d) after invalidate\n", dentry, dentry->d_name.name, d_count(dentry)); To include the name of the dentry (my current work is triggering this still). > + dentry, d_count(dentry)); > + mutex_unlock(&eventfs_mutex); > + dput(dentry); > + } > } > > /** > diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h > index c443a0c32a8c..1b880b5cd29d 100644 > --- a/fs/tracefs/internal.h > +++ b/fs/tracefs/internal.h > @@ -22,4 +22,6 @@ struct dentry *tracefs_end_creating(struct dentry *dentry); > struct dentry *tracefs_failed_creating(struct dentry *dentry); > struct inode *tracefs_get_inode(struct super_block *sb); > > +void eventfs_set_ef_status_free(struct dentry *dentry); > + > #endif /* _TRACEFS_INTERNAL_H */ > diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h > index 4d30b0cafc5f..47c1b4d21735 100644 > --- a/include/linux/tracefs.h > +++ b/include/linux/tracefs.h > @@ -51,8 +51,6 @@ void eventfs_remove(struct eventfs_file *ef); > > void eventfs_remove_events_dir(struct dentry *dentry); > > -void eventfs_set_ef_status_free(struct dentry *dentry); > - Oh, and eventfs_set_ef_status_free() should not be exported to outside the tracefs system. -- Steve > struct dentry *tracefs_create_file(const char *name, umode_t mode, > struct dentry *parent, void *data, > const struct file_operations *fops);
> On Jul 21, 2023, at 6:19 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > >> union { >> + struct list_head del_list; >> struct rcu_head rcu; >> - struct llist_node llist; /* For freeing after RCU */ >> + unsigned long is_freed; /* Freed if one of the above is set */ > > I changed the freeing around. The dentries are freed before returning from > eventfs_remove_dir(). > > I also added a "is_freed" field that is part of the union and is set if > list elements have content. Note, since the union was criticized before, I > will state the entire purpose of doing this patch set is to save memory. > This structure will be used for every event file. What's the point of > getting rid of dentries if we are replacing it with something just as big? > Anyway, struct dentry does the exact same thing! Hey, don’t shoot me… [And admittedly, I didn’t review the whole series after v1.] I understand your position, but I think that at least is_freed should not be in the union, and you can just put it after umode_t. Even for the matter of size, it should not matter in most architectures since umode_t is 16-bit, as natural alignment is at least 32-bits. [ And “bool" is clearer type for is_freed. ]
On Fri, 21 Jul 2023 17:17:24 +0000 Nadav Amit <namit@vmware.com> wrote: > > I also added a "is_freed" field that is part of the union and is set if > > list elements have content. Note, since the union was criticized before, I > > will state the entire purpose of doing this patch set is to save memory. > > This structure will be used for every event file. What's the point of > > getting rid of dentries if we are replacing it with something just as big? > > Anyway, struct dentry does the exact same thing! > > Hey, don’t shoot me… ;-) > > [And admittedly, I didn’t review the whole series after v1.] > > I understand your position, but I think that at least is_freed should not > be in the union, and you can just put it after umode_t. > > Even for the matter of size, it should not matter in most architectures > since umode_t is 16-bit, as natural alignment is at least 32-bits. My new code I'm working on removes the umode_t and will require this then. I rather have that part tested before adding other drastic changes. -- Steve > > [ And “bool" is clearer type for is_freed. ] >
On Fri, 21 Jul 2023 17:17:24 +0000
Nadav Amit <namit@vmware.com> wrote:
> [ And “bool" is clearer type for is_freed. ]
Oh, and to answer why I didn't use bool. I want to make sure that it
doesn't use just part of the other elements in the union. If the compiler
decides to use one byte for the bool (which it is perfectly valid to do
so), and it happens to map over a zero of the other elements in the union,
then it will give a false negative.
By using unsigned long, it will be guaranteed to contain some content.
-- Steve
On Fri, 21 Jul 2023 09:18:43 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > OK, I got this working and it appears to pass all my tests. I actually got > it working Wednesday night, but I tried a different approach on Thursday > that got rid of the evenfs_file and only used eventfs_inodes and makes the > files more dynamic. There's still a couple of corner cases that are not > working with this approach (the dentry counters are getting out of sync). > This should not stop this from going in. I'll continue working on that > approach for the next merge cycle. But for now, here's the patch to this > series that works. Just figured out my bug with my new design. It was in the eventfs_release() code, where I have a loop that does the dput on the children, but I wasn't dput(child), I was dput(parent) in that loop!! Anyway, for this merge window I much rather have this code in, and for the next merge window I'll add this update. I can post the new design too, but first let's focus on this. -- Steve
On Fri, 21 Jul 2023 09:19:47 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > > + } else { > > + /* A race here, should try again (unless freed) */ > > + invalidate = true; > > I had a WARN_ON() once here. Probably could add a: > > WARN_ON_ONCE(!ef->is_freed); Yeah this should have a WARN_ON_ONCE() because the only way to get here with having a dentry and the ef->dentry being set is if we have two dentries with the same name in the same directory. Which should never happen. I think we can add: /* * Should never happen unless we get here due to being freed. * Otherwise it means two dentries exist with the same name. */ WARN_ON_ONCE(!ef->is_freed); > > > + } -- Steve
> On 22-Jul-2023, at 2:10 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > > !! External Email > > On Fri, 21 Jul 2023 09:19:47 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > >>> + } else { >>> + /* A race here, should try again (unless freed) */ >>> + invalidate = true; >> >> I had a WARN_ON() once here. Probably could add a: >> >> WARN_ON_ONCE(!ef->is_freed); > > Yeah this should have a WARN_ON_ONCE() because the only way to get here > with having a dentry and the ef->dentry being set is if we have two > dentries with the same name in the same directory. Which should never > happen. > > I think we can add: > > /* > * Should never happen unless we get here due to being freed. > * Otherwise it means two dentries exist with the same name. > */ > WARN_ON_ONCE(!ef->is_freed); I missed WARN_ON_ONCE in v5 06/10, I will add in v6 06/10. - Ajay