Message ID | 20240912091548.98132-1-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] vfs: Introduce a new open flag to imply dentry deletion on file removal | expand |
On Thu 12-09-24 17:15:48, Yafang Shao wrote: > Commit 681ce8623567 ("vfs: Delete the associated dentry when deleting a > file") introduced an unconditional deletion of the associated dentry when a > file is removed. However, this led to performance regressions in specific > benchmarks, such as ilebench.sum_operations/s [0], prompting a revert in > commit 4a4be1ad3a6e ("Revert 'vfs: Delete the associated dentry when > deleting a file'"). > > This patch seeks to reintroduce the concept conditionally, where the > associated dentry is deleted only when the user explicitly opts for it > during file removal. > > There are practical use cases for this proactive dentry reclamation. > Besides the Elasticsearch use case mentioned in commit 681ce8623567, > additional examples have surfaced in our production environment. For > instance, in video rendering services that continuously generate temporary > files, upload them to persistent storage servers, and then delete them, a > large number of negative dentries—serving no useful purpose—accumulate. > Users in such cases would benefit from proactively reclaiming these > negative dentries. This patch provides an API allowing users to actively > delete these unnecessary negative dentries. > > Link: https://lore.kernel.org/linux-fsdevel/202405291318.4dfbb352-oliver.sang@intel.com [0] > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Al Viro <viro@zeniv.linux.org.uk> > Cc: Christian Brauner <brauner@kernel.org> > Cc: Jan Kara <jack@suse.cz> Umm, I don't think we want to burn a FMODE flag and expose these details of dentry reclaim to userspace. BTW, if we wanted to do this, we already have d_mark_dontcache() for in-kernel users which we could presumably reuse. But I would not completely give up on trying to handle this in an automated way inside the kernel. The original solution you mention above was perhaps too aggressive but maybe d_delete() could just mark the dentry with a "deleted" flag, retain_dentry() would move such dentries into a special LRU list which we'd prune once in a while (say once per 5 s). Also this list would be automatically pruned from prune_dcache_sb(). This way if there's quick reuse of a dentry, it will get reused and no harm is done, if it isn't quickly reused, we'll free them to not waste memory. What do people think about such scheme? Honza > --- > fs/dcache.c | 7 ++++++- > fs/open.c | 9 ++++++++- > include/linux/dcache.h | 2 +- > include/linux/sched.h | 2 +- > include/uapi/asm-generic/fcntl.h | 4 ++++ > 5 files changed, 20 insertions(+), 4 deletions(-) > > diff --git a/fs/dcache.c b/fs/dcache.c > index 3d8daaecb6d1..6d744b5e5a6c 100644 > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -1667,7 +1667,10 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name) > smp_store_release(&dentry->d_name.name, dname); /* ^^^ */ > > dentry->d_lockref.count = 1; > - dentry->d_flags = 0; > + if (current->flags & PF_REMOVE_DENTRY) > + dentry->d_flags = DCACHE_FILE_REMOVE; > + else > + dentry->d_flags = 0; > spin_lock_init(&dentry->d_lock); > seqcount_spinlock_init(&dentry->d_seq, &dentry->d_lock); > dentry->d_inode = NULL; > @@ -2394,6 +2397,8 @@ void d_delete(struct dentry * dentry) > * Are we the only user? > */ > if (dentry->d_lockref.count == 1) { > + if (dentry->d_flags & DCACHE_FILE_REMOVE) > + __d_drop(dentry); > dentry->d_flags &= ~DCACHE_CANT_MOUNT; > dentry_unlink_inode(dentry); > } else { > diff --git a/fs/open.c b/fs/open.c > index 22adbef7ecc2..3441a004a841 100644 > --- a/fs/open.c > +++ b/fs/open.c > @@ -1428,7 +1428,14 @@ static long do_sys_openat2(int dfd, const char __user *filename, > long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode) > { > struct open_how how = build_open_how(flags, mode); > - return do_sys_openat2(dfd, filename, &how); > + long err; > + > + if (flags & O_NODENTRY) > + current->flags |= PF_REMOVE_DENTRY; > + err = do_sys_openat2(dfd, filename, &how); > + if (flags & O_NODENTRY) > + current->flags &= ~PF_REMOVE_DENTRY; > + return err; > } > > > diff --git a/include/linux/dcache.h b/include/linux/dcache.h > index bff956f7b2b9..82ba79bc0072 100644 > --- a/include/linux/dcache.h > +++ b/include/linux/dcache.h > @@ -215,7 +215,7 @@ struct dentry_operations { > > #define DCACHE_NOKEY_NAME BIT(25) /* Encrypted name encoded without key */ > #define DCACHE_OP_REAL BIT(26) > - > +#define DCACHE_FILE_REMOVE BIT(27) /* remove this dentry when file is removed */ > #define DCACHE_PAR_LOOKUP BIT(28) /* being looked up (with parent locked shared) */ > #define DCACHE_DENTRY_CURSOR BIT(29) > #define DCACHE_NORCU BIT(30) /* No RCU delay for freeing */ > diff --git a/include/linux/sched.h b/include/linux/sched.h > index f8d150343d42..f931a3a882e0 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1649,7 +1649,7 @@ extern struct pid *cad_pid; > #define PF_USED_MATH 0x00002000 /* If unset the fpu must be initialized before use */ > #define PF_USER_WORKER 0x00004000 /* Kernel thread cloned from userspace thread */ > #define PF_NOFREEZE 0x00008000 /* This thread should not be frozen */ > -#define PF__HOLE__00010000 0x00010000 > +#define PF_REMOVE_DENTRY 0x00010000 /* Remove the dentry when the file is removed */ > #define PF_KSWAPD 0x00020000 /* I am kswapd */ > #define PF_MEMALLOC_NOFS 0x00040000 /* All allocations inherit GFP_NOFS. See memalloc_nfs_save() */ > #define PF_MEMALLOC_NOIO 0x00080000 /* All allocations inherit GFP_NOIO. See memalloc_noio_save() */ > diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h > index 80f37a0d40d7..ca5f402d5e7d 100644 > --- a/include/uapi/asm-generic/fcntl.h > +++ b/include/uapi/asm-generic/fcntl.h > @@ -89,6 +89,10 @@ > #define __O_TMPFILE 020000000 > #endif > > +#ifndef O_NODENTRY > +#define O_NODENTRY 040000000 > +#endif > + > /* a horrid kludge trying to make sure that this will fail on old kernels */ > #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY) > > -- > 2.43.5 >
On Thu, Sep 12, 2024 at 12:53:40PM +0200, Jan Kara wrote: > On Thu 12-09-24 17:15:48, Yafang Shao wrote: > > This patch seeks to reintroduce the concept conditionally, where the > > associated dentry is deleted only when the user explicitly opts for it > > during file removal. > > > > Umm, I don't think we want to burn a FMODE flag and expose these details of > dentry reclaim to userspace. BTW, if we wanted to do this, we already have > d_mark_dontcache() for in-kernel users which we could presumably reuse. > I don't believe any mechanism letting userspace hint at what to do with a dentry is warranted at this point. > But I would not completely give up on trying to handle this in an automated > way inside the kernel. The original solution you mention above was perhaps > too aggressive but maybe d_delete() could just mark the dentry with a > "deleted" flag, retain_dentry() would move such dentries into a special LRU > list which we'd prune once in a while (say once per 5 s). Also this list > would be automatically pruned from prune_dcache_sb(). This way if there's > quick reuse of a dentry, it will get reused and no harm is done, if it > isn't quickly reused, we'll free them to not waste memory. > > What do people think about such scheme? > I have to note what to do with a dentry after unlink is merely a subset of the general problem of what to do about negative entries. I had a look at it $elsewhere some years back and as one might suspect userspace likes to do counterproductive shit. For example it is going to stat a non-existent path 2-3 times and then open(..., O_CREAT) on it. I don't have numbers handy and someone(tm) will need to re-evaluate, but crux of the findings was as follows: - there is a small subset of negative entries which keep getting tons of hits - a sizeable count literally does not get any hits after being created (aka wastes memory) - some negative entries get 2-3 hits and get converted into a positive entry afterwards (see that stat shitter) - some flip flop with deletion/creation So whatever magic mechanism, if it wants to mostly not get in the way in terms of performance, will have to account for the above. I ended up with a kludge where negative entries hang out on some number of LRU lists and get promoted to a hot list if they manage to get some number of hits. The hot list is merely a FIFO and entries there no longer count any hits. Removal from the cold LRU also demotes an entry from the hot list. The total count is limited and if you want to create a negative dentry you have to whack one from the LRU. This is not perfect by any means but manages to succesfully separate the high churn entries from the one which are likely to stay in the long run. Definitely something to tinker with. If I read the original problem correctly this would be sorted out as a side effect by limiting how many entries are there to evict to begin with. I'm not signing up to do squat though. :)
On Thu, Sep 12, 2024 at 12:53:40PM GMT, Jan Kara wrote: > On Thu 12-09-24 17:15:48, Yafang Shao wrote: > > Commit 681ce8623567 ("vfs: Delete the associated dentry when deleting a > > file") introduced an unconditional deletion of the associated dentry when a > > file is removed. However, this led to performance regressions in specific > > benchmarks, such as ilebench.sum_operations/s [0], prompting a revert in > > commit 4a4be1ad3a6e ("Revert 'vfs: Delete the associated dentry when > > deleting a file'"). > > > > This patch seeks to reintroduce the concept conditionally, where the > > associated dentry is deleted only when the user explicitly opts for it > > during file removal. > > > > There are practical use cases for this proactive dentry reclamation. > > Besides the Elasticsearch use case mentioned in commit 681ce8623567, > > additional examples have surfaced in our production environment. For > > instance, in video rendering services that continuously generate temporary > > files, upload them to persistent storage servers, and then delete them, a > > large number of negative dentries—serving no useful purpose—accumulate. > > Users in such cases would benefit from proactively reclaiming these > > negative dentries. This patch provides an API allowing users to actively > > delete these unnecessary negative dentries. > > > > Link: https://lore.kernel.org/linux-fsdevel/202405291318.4dfbb352-oliver.sang@intel.com [0] > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > Cc: Linus Torvalds <torvalds@linux-foundation.org> > > Cc: Al Viro <viro@zeniv.linux.org.uk> > > Cc: Christian Brauner <brauner@kernel.org> > > Cc: Jan Kara <jack@suse.cz> > > Umm, I don't think we want to burn a FMODE flag and expose these details of > dentry reclaim to userspace. BTW, if we wanted to do this, we already have > d_mark_dontcache() for in-kernel users which we could presumably reuse. > > But I would not completely give up on trying to handle this in an automated > way inside the kernel. The original solution you mention above was perhaps > too aggressive but maybe d_delete() could just mark the dentry with a > "deleted" flag, retain_dentry() would move such dentries into a special LRU > list which we'd prune once in a while (say once per 5 s). Also this list > would be automatically pruned from prune_dcache_sb(). This way if there's > quick reuse of a dentry, it will get reused and no harm is done, if it > isn't quickly reused, we'll free them to not waste memory. > > What do people think about such scheme? I agree that a new open flag is not the right way to go and it also wastes a PF_* flag. I think it'll probably be rather difficult to ensure that we don't see performance regressions for some benchmark. Plus there will be users that want aggressive negative dentry reclaim being fine with possibly increased lookup cost independent of the directory case. So imo the simple option is to add 681ce8623567 back behind a sysctl that users that need aggressive negative dentry reclaim can simply turn on.
On Thu, Sep 12, 2024 at 8:04 PM Christian Brauner <brauner@kernel.org> wrote: > > On Thu, Sep 12, 2024 at 12:53:40PM GMT, Jan Kara wrote: > > On Thu 12-09-24 17:15:48, Yafang Shao wrote: > > > Commit 681ce8623567 ("vfs: Delete the associated dentry when deleting a > > > file") introduced an unconditional deletion of the associated dentry when a > > > file is removed. However, this led to performance regressions in specific > > > benchmarks, such as ilebench.sum_operations/s [0], prompting a revert in > > > commit 4a4be1ad3a6e ("Revert 'vfs: Delete the associated dentry when > > > deleting a file'"). > > > > > > This patch seeks to reintroduce the concept conditionally, where the > > > associated dentry is deleted only when the user explicitly opts for it > > > during file removal. > > > > > > There are practical use cases for this proactive dentry reclamation. > > > Besides the Elasticsearch use case mentioned in commit 681ce8623567, > > > additional examples have surfaced in our production environment. For > > > instance, in video rendering services that continuously generate temporary > > > files, upload them to persistent storage servers, and then delete them, a > > > large number of negative dentries—serving no useful purpose—accumulate. > > > Users in such cases would benefit from proactively reclaiming these > > > negative dentries. This patch provides an API allowing users to actively > > > delete these unnecessary negative dentries. > > > > > > Link: https://lore.kernel.org/linux-fsdevel/202405291318.4dfbb352-oliver.sang@intel.com [0] > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > > Cc: Linus Torvalds <torvalds@linux-foundation.org> > > > Cc: Al Viro <viro@zeniv.linux.org.uk> > > > Cc: Christian Brauner <brauner@kernel.org> > > > Cc: Jan Kara <jack@suse.cz> > > > > Umm, I don't think we want to burn a FMODE flag and expose these details of > > dentry reclaim to userspace. BTW, if we wanted to do this, we already have > > d_mark_dontcache() for in-kernel users which we could presumably reuse. > > > > But I would not completely give up on trying to handle this in an automated > > way inside the kernel. The original solution you mention above was perhaps > > too aggressive but maybe d_delete() could just mark the dentry with a > > "deleted" flag, retain_dentry() would move such dentries into a special LRU > > list which we'd prune once in a while (say once per 5 s). Also this list > > would be automatically pruned from prune_dcache_sb(). This way if there's > > quick reuse of a dentry, it will get reused and no harm is done, if it > > isn't quickly reused, we'll free them to not waste memory. > > > > What do people think about such scheme? > > I agree that a new open flag is not the right way to go and it also > wastes a PF_* flag. > > I think it'll probably be rather difficult to ensure that we don't see > performance regressions for some benchmark. Plus there will be users > that want aggressive negative dentry reclaim being fine with possibly > increased lookup cost independent of the directory case. > > So imo the simple option is to add 681ce8623567 back behind a sysctl > that users that need aggressive negative dentry reclaim can simply turn on. We are currently using the sysctl method on our production servers. This feature has been deployed across many of our production environments for several months without any reported regressions.
On Thu, Sep 12, 2024 at 01:36:45PM +0200, Mateusz Guzik wrote: > I have to note what to do with a dentry after unlink is merely a subset > of the general problem of what to do about negative entries. I had a > look at it $elsewhere some years back and as one might suspect userspace > likes to do counterproductive shit. For example it is going to stat a > non-existent path 2-3 times and then open(..., O_CREAT) on it. > > I don't have numbers handy and someone(tm) will need to re-evaluate, but > crux of the findings was as follows: > - there is a small subset of negative entries which keep getting tons of > hits > - a sizeable count literally does not get any hits after being created > (aka wastes memory) > - some negative entries get 2-3 hits and get converted into a positive > entry afterwards (see that stat shitter) > - some flip flop with deletion/creation > > So whatever magic mechanism, if it wants to mostly not get in the way in > terms of performance, will have to account for the above. > > I ended up with a kludge where negative entries hang out on some number > of LRU lists and get promoted to a hot list if they manage to get some > number of hits. The hot list is merely a FIFO and entries there no > longer count any hits. Removal from the cold LRU also demotes an entry > from the hot list. This all reminds me of that paper you pointed me at. https://arxiv.org/pdf/1512.00727 Summary for the impatient: Use 10% of the memory for a "not yet proven to be useful" entries, and use multiple Bloom filters to decide which ones are sufficiently useful to be added to the "more permanent" cache. I don't think it solves every problem our current dcache implementation has, but I'm 90% sure it'll be a huge improvement.
diff --git a/fs/dcache.c b/fs/dcache.c index 3d8daaecb6d1..6d744b5e5a6c 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1667,7 +1667,10 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name) smp_store_release(&dentry->d_name.name, dname); /* ^^^ */ dentry->d_lockref.count = 1; - dentry->d_flags = 0; + if (current->flags & PF_REMOVE_DENTRY) + dentry->d_flags = DCACHE_FILE_REMOVE; + else + dentry->d_flags = 0; spin_lock_init(&dentry->d_lock); seqcount_spinlock_init(&dentry->d_seq, &dentry->d_lock); dentry->d_inode = NULL; @@ -2394,6 +2397,8 @@ void d_delete(struct dentry * dentry) * Are we the only user? */ if (dentry->d_lockref.count == 1) { + if (dentry->d_flags & DCACHE_FILE_REMOVE) + __d_drop(dentry); dentry->d_flags &= ~DCACHE_CANT_MOUNT; dentry_unlink_inode(dentry); } else { diff --git a/fs/open.c b/fs/open.c index 22adbef7ecc2..3441a004a841 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1428,7 +1428,14 @@ static long do_sys_openat2(int dfd, const char __user *filename, long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode) { struct open_how how = build_open_how(flags, mode); - return do_sys_openat2(dfd, filename, &how); + long err; + + if (flags & O_NODENTRY) + current->flags |= PF_REMOVE_DENTRY; + err = do_sys_openat2(dfd, filename, &how); + if (flags & O_NODENTRY) + current->flags &= ~PF_REMOVE_DENTRY; + return err; } diff --git a/include/linux/dcache.h b/include/linux/dcache.h index bff956f7b2b9..82ba79bc0072 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -215,7 +215,7 @@ struct dentry_operations { #define DCACHE_NOKEY_NAME BIT(25) /* Encrypted name encoded without key */ #define DCACHE_OP_REAL BIT(26) - +#define DCACHE_FILE_REMOVE BIT(27) /* remove this dentry when file is removed */ #define DCACHE_PAR_LOOKUP BIT(28) /* being looked up (with parent locked shared) */ #define DCACHE_DENTRY_CURSOR BIT(29) #define DCACHE_NORCU BIT(30) /* No RCU delay for freeing */ diff --git a/include/linux/sched.h b/include/linux/sched.h index f8d150343d42..f931a3a882e0 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1649,7 +1649,7 @@ extern struct pid *cad_pid; #define PF_USED_MATH 0x00002000 /* If unset the fpu must be initialized before use */ #define PF_USER_WORKER 0x00004000 /* Kernel thread cloned from userspace thread */ #define PF_NOFREEZE 0x00008000 /* This thread should not be frozen */ -#define PF__HOLE__00010000 0x00010000 +#define PF_REMOVE_DENTRY 0x00010000 /* Remove the dentry when the file is removed */ #define PF_KSWAPD 0x00020000 /* I am kswapd */ #define PF_MEMALLOC_NOFS 0x00040000 /* All allocations inherit GFP_NOFS. See memalloc_nfs_save() */ #define PF_MEMALLOC_NOIO 0x00080000 /* All allocations inherit GFP_NOIO. See memalloc_noio_save() */ diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h index 80f37a0d40d7..ca5f402d5e7d 100644 --- a/include/uapi/asm-generic/fcntl.h +++ b/include/uapi/asm-generic/fcntl.h @@ -89,6 +89,10 @@ #define __O_TMPFILE 020000000 #endif +#ifndef O_NODENTRY +#define O_NODENTRY 040000000 +#endif + /* a horrid kludge trying to make sure that this will fail on old kernels */ #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)
Commit 681ce8623567 ("vfs: Delete the associated dentry when deleting a file") introduced an unconditional deletion of the associated dentry when a file is removed. However, this led to performance regressions in specific benchmarks, such as ilebench.sum_operations/s [0], prompting a revert in commit 4a4be1ad3a6e ("Revert 'vfs: Delete the associated dentry when deleting a file'"). This patch seeks to reintroduce the concept conditionally, where the associated dentry is deleted only when the user explicitly opts for it during file removal. There are practical use cases for this proactive dentry reclamation. Besides the Elasticsearch use case mentioned in commit 681ce8623567, additional examples have surfaced in our production environment. For instance, in video rendering services that continuously generate temporary files, upload them to persistent storage servers, and then delete them, a large number of negative dentries—serving no useful purpose—accumulate. Users in such cases would benefit from proactively reclaiming these negative dentries. This patch provides an API allowing users to actively delete these unnecessary negative dentries. Link: https://lore.kernel.org/linux-fsdevel/202405291318.4dfbb352-oliver.sang@intel.com [0] Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> --- fs/dcache.c | 7 ++++++- fs/open.c | 9 ++++++++- include/linux/dcache.h | 2 +- include/linux/sched.h | 2 +- include/uapi/asm-generic/fcntl.h | 4 ++++ 5 files changed, 20 insertions(+), 4 deletions(-)