Message ID | 20240511022729.35144-1-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] fs: dcache: Delete the associated dentry when deleting a file | expand |
On Fri, 10 May 2024 at 19:28, Yafang Shao <laoar.shao@gmail.com> wrote: > > We've devised a solution to address both issues by deleting associated > dentry when removing a file. This patch is buggy. You are modifying d_flags outside the locked region. So at a minimum, the DCACHE_FILE_DELETED bit setting would need to just go into the if (dentry->d_lockref.count == 1) { side of the conditional, since the other side of that conditional already unhashes the dentry which makes this all moot anyway. That said, I think it's buggy in another way too: what if somebody else looks up the dentry before it actually gets unhashed? Then you have another ref to it, and the dentry might live long enough that it then gets re-used for a newly created file (which is why we have those negative dentries in the first place). So you'd have to clear the DCACHE_FILE_DELETED if the dentry is then made live by a file creation or rename or whatever. So that d_flags thing is actually pretty complicated. But since you made all this unconditional anyway, I think having a new dentry flag is unnecessary in the first place, and I suspect you are better off just unhashing the dentry unconditionally instead. IOW, I think the simpler patch is likely just something like this: --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2381,6 +2381,7 @@ void d_delete(struct dentry * dentry) spin_lock(&inode->i_lock); spin_lock(&dentry->d_lock); + __d_drop(dentry); /* * Are we the only user? */ @@ -2388,7 +2389,6 @@ void d_delete(struct dentry * dentry) dentry->d_flags &= ~DCACHE_CANT_MOUNT; dentry_unlink_inode(dentry); } else { - __d_drop(dentry); spin_unlock(&dentry->d_lock); spin_unlock(&inode->i_lock); } although I think Al needs to ACK this, and I suspect that unhashing the dentry also makes that dentry->d_flags &= ~DCACHE_CANT_MOUNT; pointless (because the dentry won't be reused, so DCACHE_CANT_MOUNT just won't matter). I do worry that there are loads that actually love our current behavior, but maybe it's worth doing the simple unconditional "make d_delete() always unhash" and only worry about whether that causes performance problems for people who commonly create a new file in its place when we get such a report. IOW, the more complex thing might be to actually take other behavior into account (eg "do we have so many negative dentries that we really don't want to create new ones"). Al - can you please step in and tell us what else I've missed, and why my suggested version of the patch is also broken garbage? Linus
On Sat, May 11, 2024 at 10:54 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Fri, 10 May 2024 at 19:28, Yafang Shao <laoar.shao@gmail.com> wrote: > > > > We've devised a solution to address both issues by deleting associated > > dentry when removing a file. > > This patch is buggy. You are modifying d_flags outside the locked region. > > So at a minimum, the DCACHE_FILE_DELETED bit setting would need to > just go into the > > if (dentry->d_lockref.count == 1) { > > side of the conditional, since the other side of that conditional > already unhashes the dentry which makes this all moot anyway. > > That said, I think it's buggy in another way too: what if somebody > else looks up the dentry before it actually gets unhashed? Then you > have another ref to it, and the dentry might live long enough that it > then gets re-used for a newly created file (which is why we have those > negative dentries in the first place). > > So you'd have to clear the DCACHE_FILE_DELETED if the dentry is then > made live by a file creation or rename or whatever. > > So that d_flags thing is actually pretty complicated. > > But since you made all this unconditional anyway, I think having a new > dentry flag is unnecessary in the first place, and I suspect you are > better off just unhashing the dentry unconditionally instead. > > IOW, I think the simpler patch is likely just something like this: It's simpler. I used to contemplate handling it that way, but lack the knowledge and courage to proceed, hence I opted for the d_flags solution. I'll conduct tests on the revised change. Appreciate your suggestion. > > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -2381,6 +2381,7 @@ void d_delete(struct dentry * dentry) > > spin_lock(&inode->i_lock); > spin_lock(&dentry->d_lock); > + __d_drop(dentry); > /* > * Are we the only user? > */ > @@ -2388,7 +2389,6 @@ void d_delete(struct dentry * dentry) > dentry->d_flags &= ~DCACHE_CANT_MOUNT; > dentry_unlink_inode(dentry); > } else { > - __d_drop(dentry); > spin_unlock(&dentry->d_lock); > spin_unlock(&inode->i_lock); > } > > although I think Al needs to ACK this, and I suspect that unhashing > the dentry also makes that > > dentry->d_flags &= ~DCACHE_CANT_MOUNT; > > pointless (because the dentry won't be reused, so DCACHE_CANT_MOUNT > just won't matter). > > I do worry that there are loads that actually love our current > behavior, but maybe it's worth doing the simple unconditional "make > d_delete() always unhash" and only worry about whether that causes > performance problems for people who commonly create a new file in its > place when we get such a report. > > IOW, the more complex thing might be to actually take other behavior > into account (eg "do we have so many negative dentries that we really > don't want to create new ones"). This poses a substantial challenge. Despite recurrent discussions within the community about improving negative dentry over and over, there hasn't been a consensus on how to address it. > > Al - can you please step in and tell us what else I've missed, and why > my suggested version of the patch is also broken garbage? > > Linus
On Fri, May 10, 2024 at 07:53:49PM -0700, Linus Torvalds wrote: > although I think Al needs to ACK this, and I suspect that unhashing > the dentry also makes that > > dentry->d_flags &= ~DCACHE_CANT_MOUNT; > > pointless (because the dentry won't be reused, so DCACHE_CANT_MOUNT > just won't matter). > > I do worry that there are loads that actually love our current > behavior, but maybe it's worth doing the simple unconditional "make > d_delete() always unhash" and only worry about whether that causes > performance problems for people who commonly create a new file in its > place when we get such a report. > > IOW, the more complex thing might be to actually take other behavior > into account (eg "do we have so many negative dentries that we really > don't want to create new ones"). > > Al - can you please step in and tell us what else I've missed, and why > my suggested version of the patch is also broken garbage? Need to RTFS and think for a while; I think it should be OK, but I'll need to dig through the tree to tell if there's anything nasty for e.g. network filesystems. Said that, I seriously suspect that there are loads where it would become painful. unlink() + creat() is _not_ a rare sequence, and this would shove an extra negative lookup into each of those. I would like to see the details on original posters' setup. Note that successful rmdir() evicts all children, so that it would seem that their arseloads of negative dentries come from a bunch of unlinks in surviving directories. It would be interesting to see if using something like mkdir graveyard rename victim over there, then unlink in new place rename next victim over there, then unlink in new place .... rmdir graveyard would change the situation with memory pressure - it would trigger eviction of all those negatives at controlled point(s) (rmdir). I'm not saying that it's a good mitigation, but it would get more details on that memory pressure.
On Sat, May 11, 2024 at 11:36 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Fri, May 10, 2024 at 07:53:49PM -0700, Linus Torvalds wrote: > > > although I think Al needs to ACK this, and I suspect that unhashing > > the dentry also makes that > > > > dentry->d_flags &= ~DCACHE_CANT_MOUNT; > > > > pointless (because the dentry won't be reused, so DCACHE_CANT_MOUNT > > just won't matter). > > > > I do worry that there are loads that actually love our current > > behavior, but maybe it's worth doing the simple unconditional "make > > d_delete() always unhash" and only worry about whether that causes > > performance problems for people who commonly create a new file in its > > place when we get such a report. > > > > IOW, the more complex thing might be to actually take other behavior > > into account (eg "do we have so many negative dentries that we really > > don't want to create new ones"). > > > > Al - can you please step in and tell us what else I've missed, and why > > my suggested version of the patch is also broken garbage? > > Need to RTFS and think for a while; I think it should be OK, but I'll need > to dig through the tree to tell if there's anything nasty for e.g. network > filesystems. > > Said that, I seriously suspect that there are loads where it would become > painful. unlink() + creat() is _not_ a rare sequence, and this would > shove an extra negative lookup into each of those. > > I would like to see the details on original posters' setup. Note that > successful rmdir() evicts all children, so that it would seem that their > arseloads of negative dentries come from a bunch of unlinks in surviving > directories. Right, the directories are fixed. We've engaged in discussions with ES users regarding changing the directory, but it would entail a significant adjustment for them. > > It would be interesting to see if using something like > mkdir graveyard > rename victim over there, then unlink in new place > rename next victim over there, then unlink in new place > .... > rmdir graveyard > would change the situation with memory pressure - it would trigger > eviction of all those negatives at controlled point(s) (rmdir). > I'm not saying that it's a good mitigation, but it would get more > details on that memory pressure.
On 5/10/24 23:35, Yafang Shao wrote: >> pointless (because the dentry won't be reused, so DCACHE_CANT_MOUNT >> just won't matter). >> >> I do worry that there are loads that actually love our current >> behavior, but maybe it's worth doing the simple unconditional "make >> d_delete() always unhash" and only worry about whether that causes >> performance problems for people who commonly create a new file in its >> place when we get such a report. >> >> IOW, the more complex thing might be to actually take other behavior >> into account (eg "do we have so many negative dentries that we really >> don't want to create new ones"). > This poses a substantial challenge. Despite recurrent discussions > within the community about improving negative dentry over and over, > there hasn't been a consensus on how to address it. I had suggested in the past to have a sysctl parameter to set a threshold on how many negative dentries (as a percentage of total system memory) are regarded as too many and activate some processes to control dentry creation. The hard part is to discard the oldest negative dentries as we can have many memcg LRU lists to look at and it can be a time consuming process. We could theoretically start removing dentries when removing files when 50% of the threshold is reached. At 100%, we start discarding old negative dentries. At 150%, we stop negative dentry creation altogether. My 2 cents. Cheers, Longman
On Sat, May 11, 2024 at 04:36:19AM +0100, Al Viro wrote: > Said that, I seriously suspect that there are loads where it would become > painful. unlink() + creat() is _not_ a rare sequence, and this would > shove an extra negative lookup into each of those. > > I would like to see the details on original posters' setup. Note that > successful rmdir() evicts all children, so that it would seem that their > arseloads of negative dentries come from a bunch of unlinks in surviving > directories. > > It would be interesting to see if using something like > mkdir graveyard > rename victim over there, then unlink in new place > rename next victim over there, then unlink in new place > .... > rmdir graveyard > would change the situation with memory pressure - it would trigger > eviction of all those negatives at controlled point(s) (rmdir). > I'm not saying that it's a good mitigation, but it would get more > details on that memory pressure. BTW, how about adding to do_vfs_ioctl() something like case FS_IOC_FORGET: shrink_dcache_parent(file->f_path.dentry); return 0; possibly with option for dropping only negatives? Even in the minimal form it would allow userland to force eviction in given directory tree, without disrupting things anywhere else. That's a trivial (completely untested) patch, really - diff --git a/fs/ioctl.c b/fs/ioctl.c index 1d5abfdf0f22..342bb71cf76c 100644 --- a/fs/ioctl.c +++ b/fs/ioctl.c @@ -878,6 +878,12 @@ static int do_vfs_ioctl(struct file *filp, unsigned int fd, case FS_IOC_GETFSSYSFSPATH: return ioctl_get_fs_sysfs_path(filp, argp); + case FS_IOC_FORGET: + if (arg != 0) // only 0 for now + break; + shrink_dcache_parent(filp->f_path.dentry); + return 0; + default: if (S_ISREG(inode->i_mode)) return file_ioctl(filp, cmd, argp); diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index 45e4e64fd664..143129510289 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -222,6 +222,7 @@ struct fsxattr { #define FS_IOC_GETFLAGS _IOR('f', 1, long) #define FS_IOC_SETFLAGS _IOW('f', 2, long) +#define FS_IOC_FORGET _IOW('f', 255, int) #define FS_IOC_GETVERSION _IOR('v', 1, long) #define FS_IOC_SETVERSION _IOW('v', 2, long) #define FS_IOC_FIEMAP _IOWR('f', 11, struct fiemap)
On Fri, 10 May 2024 at 20:36, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Said that, I seriously suspect that there are loads where it would become > painful. unlink() + creat() is _not_ a rare sequence, and this would > shove an extra negative lookup into each of those. The other side of this is that the "lots of negative dentries after people have removed files" is definitely something that has come up before as a latency problem. So I do wonder if we've just been too worried about not changing the status quo, and maybe the whole "make unlink() turn a positive dentry into a negative one" was always a mis-optimization. I do agree that we might have to do something more complicated, but I think that before we just _assume_ we have to do that, maybe we should just do the simple and stupid thing. Because while "unlink and create" is most definitely a very real pattern, maybe it's not really _so_ common that we should care about it as a primary issue? The reason to keep the negative dentry around is to avoid the unnecessary "->lookup()" followed by "->create()" directory operations. But network filesystems - where that is _particularly_ expensive - end up having done the whole ->atomic_open thing anyway, so maybe that reason isn't as big of a deal as it used to be? And I don't particularly like your "give people a flush ioctl" either. There are security concerns with that one - both for timing and for just basically messing with performance. At the very least, I think it should require write permissions on the directory you are flushing, but I really think we should instead aim to be better about this in the first place. Anyway, in how many loads is that "unlink -> create" pattern more than a very occasional thing? Which is why I think maybe we're just overthinking this and being too timid when saying "we should count negative dentries or something". Linus
On Sat, May 11, 2024 at 12:54:18AM -0400, Waiman Long wrote: > I had suggested in the past to have a sysctl parameter to set a threshold on > how many negative dentries (as a percentage of total system memory) are Yes, but that's obviously bogus. System memory is completely uncorrelated with the optimum number of negative dentries to keep around for workload performance. Some multiple of "number of positive dentries" might make sense.
On Sat, 11 May 2024 at 08:59, Matthew Wilcox <willy@infradead.org> wrote: > > Some multiple of "number of positive dentries" might make sense. The thing is, it would probably have to be per parent-dentry to be useful, since that is typically what causes latency concerns: not "global negative dentries", but "negative dentries that have to be flushed for this parent as it is deleted". And you'd probably need to make the child list be some kind of LRU list to make this all effective. Which is all likely rather expensive in both CPU time (stats tend to cause lots of dirty caches) and in memory (the dentry would grow for both stats and for the d_children list - it would almost certainly have to change from a 'struct hlist_head' to a 'struct list_head' so that you can put things either at the to the beginning or end). I dunno. I haven't looked at just *how* nasty it would be. Maybe it wouldn't be as bad as I suspect it would be. Now, the other option might be to just make the latency concerns smaller. It's not like removing negative dentries is very costly per se. I think the issue has always been the dcache_lock, not the work to remove the dentries themselves. Linus
On Sat, 11 May 2024 at 09:07, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Now, the other option might be to just make the latency concerns > smaller. It's not like removing negative dentries is very costly per > se. I think the issue has always been the dcache_lock, not the work to > remove the dentries themselves. Actually, going back to re-read this particular report, at least this time it was the inode lock of the parent, not the dcache_lock. But the point ends up being the same - lots of negative dentries aren't necessarily a problem in themselves, because the common operation that matters is the hash lookup, which scales fairly well. They mainly tend to become a problem when they hold up something else. So better batching, or maybe just walking the negative child dentry list after having marked the parent dead and then released the lock, might also be the solution. Linus
On Sat, 11 May 2024 at 09:13, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > So better batching, or maybe just walking the negative child dentry > list after having marked the parent dead and then released the lock, > might also be the solution. IOW, maybe the solution is something as simple as this UNTESTED patch instead? --- a/fs/namei.c +++ b/fs/namei.c @@ -4207,16 +4207,19 @@ int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir, if (error) goto out; - shrink_dcache_parent(dentry); dentry->d_inode->i_flags |= S_DEAD; dont_mount(dentry); detach_mounts(dentry); + inode_unlock(dentry->d_inode); + + shrink_dcache_parent(dentry); + dput(dentry); + d_delete_notify(dir, dentry); + return 0; out: inode_unlock(dentry->d_inode); dput(dentry); - if (!error) - d_delete_notify(dir, dentry); return error; } EXPORT_SYMBOL(vfs_rmdir); where that "shrink_dcache_parent()" will still be quite expensive if the directory has a ton of negative dentries, but because we now free them after we've marked the dentry dead and released the inode, nobody much cares any more? Note (as usual) that this is untested, and maybe I haven't thought everything through and I might be missing some important detail. But I think shrinking the children later should be just fine - there's nothing people can *do* with them. At worst they are reachable for some lookup that doesn't take the locks, but that will just result in a negative dentry which is all good, since they cannot become positive dentries any more at this point. So I think the inode lock is irrelevant for negative dentries, and shrinking them outside the lock feels like a very natural thing to do. Again: this is more of a "brainstorming patch" than an actual suggestion. Yafang - it might be worth testing for your load, but please do so knowing that it might have some consistency issues, so don't test it on any production machinery, please ;) Can anybody point out why I'm being silly, and the above change is completely broken garbage? Please use small words to point out my mental deficiencies to make sure I get it. Just to clarify: this obviously will *not* speed up the actual rmdir itself. *ALL* it does is to avoid holding the lock over the potentially long cleanup operation. So the latency of the rmdir is still the same, but now it should no longer matter for anything else. Linus
On Sat, May 11, 2024 at 11:35 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > On Sat, May 11, 2024 at 10:54 AM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > On Fri, 10 May 2024 at 19:28, Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > We've devised a solution to address both issues by deleting associated > > > dentry when removing a file. > > > > This patch is buggy. You are modifying d_flags outside the locked region. > > > > So at a minimum, the DCACHE_FILE_DELETED bit setting would need to > > just go into the > > > > if (dentry->d_lockref.count == 1) { > > > > side of the conditional, since the other side of that conditional > > already unhashes the dentry which makes this all moot anyway. > > > > That said, I think it's buggy in another way too: what if somebody > > else looks up the dentry before it actually gets unhashed? Then you > > have another ref to it, and the dentry might live long enough that it > > then gets re-used for a newly created file (which is why we have those > > negative dentries in the first place). > > > > So you'd have to clear the DCACHE_FILE_DELETED if the dentry is then > > made live by a file creation or rename or whatever. > > > > So that d_flags thing is actually pretty complicated. > > > > But since you made all this unconditional anyway, I think having a new > > dentry flag is unnecessary in the first place, and I suspect you are > > better off just unhashing the dentry unconditionally instead. > > > > IOW, I think the simpler patch is likely just something like this: > > It's simpler. I used to contemplate handling it that way, but lack the > knowledge and courage to proceed, hence I opted for the d_flags > solution. > I'll conduct tests on the revised change. Appreciate your suggestion. > We have successfully applied a hotfix to a subset of our production servers, totaling several thousand. The hotfix is as follows: diff --git a/fs/dcache.c b/fs/dcache.c index 52e6d5f..30eb733 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2557,14 +2557,14 @@ void d_delete(struct dentry * dentry) spin_lock(&inode->i_lock); spin_lock(&dentry->d_lock); + __d_drop(dentry); + /* * Are we the only user? */ if (dentry->d_lockref.count == 1) { - dentry->d_flags &= ~DCACHE_CANT_MOUNT; dentry_unlink_inode(dentry); } else { - __d_drop(dentry); spin_unlock(&dentry->d_lock); spin_unlock(&inode->i_lock); } So far, it has been functioning well without any regressions. We are planning to roll this update out to our entire fleet, which consists of hundreds of thousands of servers. I believe this change is still necessary. Would you prefer to commit it directly, or should I send an official patch? If the "unlink-create" issue is a concern, perhaps we can address it by adding a /sys/kernel/debug/vfs/delete_file_legacy entry? > > > > --- a/fs/dcache.c > > +++ b/fs/dcache.c > > @@ -2381,6 +2381,7 @@ void d_delete(struct dentry * dentry) > > > > spin_lock(&inode->i_lock); > > spin_lock(&dentry->d_lock); > > + __d_drop(dentry); > > /* > > * Are we the only user? > > */ > > @@ -2388,7 +2389,6 @@ void d_delete(struct dentry * dentry) > > dentry->d_flags &= ~DCACHE_CANT_MOUNT; > > dentry_unlink_inode(dentry); > > } else { > > - __d_drop(dentry); > > spin_unlock(&dentry->d_lock); > > spin_unlock(&inode->i_lock); > > } > > > > although I think Al needs to ACK this, and I suspect that unhashing > > the dentry also makes that > > > > dentry->d_flags &= ~DCACHE_CANT_MOUNT; > > > > pointless (because the dentry won't be reused, so DCACHE_CANT_MOUNT > > just won't matter). > > > > I do worry that there are loads that actually love our current > > behavior, but maybe it's worth doing the simple unconditional "make > > d_delete() always unhash" and only worry about whether that causes > > performance problems for people who commonly create a new file in its > > place when we get such a report. > > > > IOW, the more complex thing might be to actually take other behavior > > into account (eg "do we have so many negative dentries that we really > > don't want to create new ones"). > > This poses a substantial challenge. Despite recurrent discussions > within the community about improving negative dentry over and over, > there hasn't been a consensus on how to address it. > > > > > Al - can you please step in and tell us what else I've missed, and why > > my suggested version of the patch is also broken garbage? > > > > Linus > > > -- > Regards > Yafang -- Regards Yafang
On Tue, 14 May 2024 at 19:19, Yafang Shao <laoar.shao@gmail.com> wrote: > > I believe this change is still necessary. Would you prefer to commit > it directly, or should I send an official patch? Sending an official patch just for people to try out sounds good regardless, but I'd really like some real performance testing on other loads too before just taking it. It *may* be acceptable, and it's certainly simple. It's also almost certainly safe from a correctness angle. But it might regress performance on other important loads even if it fixes an issue on your load. That's why we had the whole discussion about alternatives. I'm still of the opinion that we should probably *try* this simple approach, but I really would hope we could have some test tree that people run a lot of benchmarks on. Does anybody do filesystem benchmarking on the mm tree? Or do we have some good tree to just give it a good testing? It feels a bit excessive to put it in my development tree just to get some performance testing coverage, but maybe that's what we have to do.. Linus
diff --git a/fs/dcache.c b/fs/dcache.c index 71a8e943a0fa..4b97f60f0e64 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -701,6 +701,9 @@ static inline bool retain_dentry(struct dentry *dentry, bool locked) if (unlikely(d_flags & DCACHE_DONTCACHE)) return false; + if (unlikely(dentry->d_flags & DCACHE_FILE_DELETED)) + return false; + // At this point it looks like we ought to keep it. We also might // need to do something - put it on LRU if it wasn't there already // and mark it referenced if it was on LRU, but not marked yet. @@ -2392,6 +2395,7 @@ void d_delete(struct dentry * dentry) spin_unlock(&dentry->d_lock); spin_unlock(&inode->i_lock); } + dentry->d_flags |= DCACHE_FILE_DELETED; } EXPORT_SYMBOL(d_delete); diff --git a/include/linux/dcache.h b/include/linux/dcache.h index bf53e3894aae..55a69682918c 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -210,7 +210,7 @@ struct dentry_operations { #define DCACHE_NOKEY_NAME BIT(25) /* Encrypted name encoded without key */ #define DCACHE_OP_REAL BIT(26) - +#define DCACHE_FILE_DELETED BIT(27) /* File is deleted */ #define DCACHE_PAR_LOOKUP BIT(28) /* being looked up (with parent locked shared) */ #define DCACHE_DENTRY_CURSOR BIT(29) #define DCACHE_NORCU BIT(30) /* No RCU delay for freeing */
Our applications, built on Elasticsearch[0], frequently create and delete files. These applications operate within containers, some with a memory limit exceeding 100GB. Over prolonged periods, the accumulation of negative dentries within these containers can amount to tens of gigabytes. Upon container exit, directories are deleted. However, due to the numerous associated dentries, this process can be time-consuming. Our users have expressed frustration with this prolonged exit duration, which constitutes our first issue. Simultaneously, other processes may attempt to access the parent directory of the Elasticsearch directories. Since the task responsible for deleting the dentries holds the inode lock, processes attempting directory lookup experience significant delays. This issue, our second problem, is easily demonstrated: - Task 1 generates negative dentries: $ pwd ~/test $ mkdir es && cd es/ && ./create_and_delete_files.sh [ After generating tens of GB dentries ] $ cd ~/test && rm -rf es [ It will take a long duration to finish ] - Task 2 attempts to lookup the 'test/' directory $ pwd ~/test $ ls The 'ls' command in Task 2 experiences prolonged execution as Task 1 is deleting the dentries. We've devised a solution to address both issues by deleting associated dentry when removing a file. Interestingly, we've noted that a similar patch was proposed years ago[1], although it was rejected citing the absence of tangible issues caused by negative dentries. Given our current challenges, we're resubmitting the proposal. All relevant stakeholders from previous discussions have been included for reference. [0]. https://github.com/elastic/elasticsearch [1]. https://patchwork.kernel.org/project/linux-fsdevel/patch/1502099673-31620-1-git-send-email-wangkai86@huawei.com Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Cc: Wangkai <wangkai86@huawei.com> Cc: Jan Kara <jack@suse.cz> Cc: Colin Walters <walters@verbum.org> Cc: Waiman Long <longman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> --- fs/dcache.c | 4 ++++ include/linux/dcache.h | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-)