Message ID | 1502099673-31620-1-git-send-email-wangkai86@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> -----Original Message----- > From: Wangkai (Kevin,C) > Sent: Monday, August 07, 2017 5:55 PM > To: linux-fsdevel@vger.kernel.org; viro@zeniv.linux.org.uk > Cc: Wangkai (Kevin,C); Renjinyong (Renjinyong, Business Support Dept) > Subject: [PATCH] fs/dcache: dentries should free after files unlinked or > directories removed > > sometimes, on my server, there were lots of files creating and removing, and I > found that memory usage were high, and I checked, almost all the memory was > occupied by slab recliamable, the slab struct was dentry, and I checked the > code of dcache, and found that when a file was deleted the dentry never free, > unless a memory recliam was triggerd. > > I made this patch to mark the dentry as a remove state after file unlinked or > directory removed, and when the dentry’s reference count dec to zero and > free it, and it worked well on my server base on kernel 4.4. > > > Signed-off-by: Wangkai <wangkai86@huawei.com> > --- > fs/dcache.c | 12 +++++++++++- > fs/namei.c | 6 ++++++ > include/linux/dcache.h | 2 +- > 3 files changed, 18 insertions(+), 2 deletions(-) > mode change 100644 => 100755 fs/dcache.c > mode change 100644 => 100755 fs/namei.c > mode change 100644 => 100755 include/linux/dcache.h > > diff --git a/fs/dcache.c b/fs/dcache.c > old mode 100644 > new mode 100755 > index f901413..6828463 > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -722,7 +722,8 @@ static inline bool fast_dput(struct dentry *dentry) > */ > smp_rmb(); > d_flags = ACCESS_ONCE(dentry->d_flags); > - d_flags &= DCACHE_REFERENCED | DCACHE_LRU_LIST | > DCACHE_DISCONNECTED; > + d_flags &= DCACHE_REFERENCED | DCACHE_LRU_LIST | > DCACHE_DISCONNECTED > + | DCACHE_FILE_REMOVED; > > /* Nothing to do? Dropping the reference was all we needed? */ > if (d_flags == (DCACHE_REFERENCED | DCACHE_LRU_LIST) > && !d_unhashed(dentry)) > @@ -816,6 +817,15 @@ void dput(struct dentry *dentry) > dentry_lru_add(dentry); > > dentry->d_lockref.count--; > + > + /* > + * if file has been declare as removed and reference count is zero > + * then we can free the dentry rather than leave it stay in dcache > + */ > + if (unlikely(dentry->d_flags & DCACHE_FILE_REMOVED)) { > + if (dentry->d_lockref.count == 0) > + goto kill_it; > + } > spin_unlock(&dentry->d_lock); > return; > > diff --git a/fs/namei.c b/fs/namei.c > old mode 100644 > new mode 100755 > index ddb6a7c..0ec1478 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -3918,6 +3918,9 @@ static long do_rmdir(int dfd, const char __user > *pathname) > goto exit3; > error = vfs_rmdir(path.dentry->d_inode, dentry); > exit3: > + /* after remove the dir set dentry remove flag */ > + if (!error) > + dentry->d_flags |= DCACHE_FILE_REMOVED; > dput(dentry); > exit2: > inode_unlock(path.dentry->d_inode); > @@ -4042,6 +4045,9 @@ static long do_unlinkat(int dfd, const char __user > *pathname) > goto exit2; > error = vfs_unlink(path.dentry->d_inode, dentry, &delegated_inode); > exit2: > + /* after unlink file set dentry remove flag */ > + if (!error) > + dentry->d_flags |= DCACHE_FILE_REMOVED; > dput(dentry); > } > inode_unlock(path.dentry->d_inode); > diff --git a/include/linux/dcache.h b/include/linux/dcache.h > old mode 100644 > new mode 100755 > index aae1cdb..2d65bd6 > --- a/include/linux/dcache.h > +++ b/include/linux/dcache.h > @@ -215,7 +215,7 @@ struct dentry_operations { > #define DCACHE_FALLTHRU 0x01000000 /* Fall through to lower > layer */ > #define DCACHE_ENCRYPTED_WITH_KEY 0x02000000 /* dir is encrypted > with a valid key */ > #define DCACHE_OP_REAL 0x04000000 > - > +#define DCACHE_FILE_REMOVED 0x08000000 /* file or dir has been > unlinked or removed */ > #define DCACHE_PAR_LOOKUP 0x10000000 /* being looked up (with > parent locked shared) */ > #define DCACHE_DENTRY_CURSOR 0x20000000 > > -- > 2.8.0.GIT Hi, all The negative dentries keep growing and waste a lots of kernel memory, this problem has been Occurred on my server, and I looked for internet, and many people had met the same problem. Recently, I discussed with Longman, we have two different patches to solve this problem, In my patch, remove the negative dentries with the files unlinked, I checked 15 years ago, Viro and linus had talked about this, that unlink was only worth doing. ref: http://yarchive.net/comp/linux/negative_dentries.html in Longman patch, limit the negative dentries number maybe we can discuss. Thanks, Kevin
On Fri 25-08-17 07:06:50, Wangkai (Kevin,C) wrote: > > -----Original Message----- > > From: Wangkai (Kevin,C) > > Sent: Monday, August 07, 2017 5:55 PM > > To: linux-fsdevel@vger.kernel.org; viro@zeniv.linux.org.uk > > Cc: Wangkai (Kevin,C); Renjinyong (Renjinyong, Business Support Dept) > > Subject: [PATCH] fs/dcache: dentries should free after files unlinked or > > directories removed > > > > sometimes, on my server, there were lots of files creating and removing, and I > > found that memory usage were high, and I checked, almost all the memory was > > occupied by slab recliamable, the slab struct was dentry, and I checked the > > code of dcache, and found that when a file was deleted the dentry never free, > > unless a memory recliam was triggerd. > > > > I made this patch to mark the dentry as a remove state after file unlinked or > > directory removed, and when the dentry’s reference count dec to zero and > > free it, and it worked well on my server base on kernel 4.4. > > > > > > Signed-off-by: Wangkai <wangkai86@huawei.com> ... > Hi, all > The negative dentries keep growing and waste a lots of kernel memory, this problem has been > Occurred on my server, and I looked for internet, and many people had met the same problem. > Recently, I discussed with Longman, we have two different patches to solve this problem, > In my patch, remove the negative dentries with the files unlinked, I checked 15 years ago, > Viro and linus had talked about this, that unlink was only worth doing. > ref: http://yarchive.net/comp/linux/negative_dentries.html > in Longman patch, limit the negative dentries number > maybe we can discuss. So both you and Waiman complain about negative dentries consuming space (and I agree, they do) but neither of you has explained why it is a problem. If memory is ever needed, negative dentries are very easy to reclaim. So to some extent this is like complaing that page cache consumes your memory - which is again true but it is a deliberate decision and it helps performance. It is possible that some of these dentries are so rarely used that they are indeed just a waste but then I'd like to see detailed analysis of which negative dentries are these and how your reclaim heuristics improve the situation. But I haven't seen any performance numbers from either you or Waiman. So please gather some performance numbers justifying your change so that we have something to talk about... Honza
On Fri, Aug 25, 2017, at 10:47 AM, Jan Kara wrote: > > It is possible that some of these dentries are so rarely used that they are > indeed just a waste In some cases - think containers, or ostree-style root filesystem snapshots, if we do an `rm -rf /path/to/container-root`, userspace knows for a fact that nothing will reference those paths again - all of the processes that could have been killed. There's no point to having negative dentries for them. Maybe something like unlinkat (dfd, path, AT_UNLINKAT_DONTNEED), like madvise (MAV_DONTNEED) ?
On Fri, Aug 25, 2017 at 8:10 AM, Colin Walters <walters@verbum.org> wrote: > On Fri, Aug 25, 2017, at 10:47 AM, Jan Kara wrote: >> >> It is possible that some of these dentries are so rarely used that they are >> indeed just a waste > > In some cases - think containers, or ostree-style root filesystem snapshots, > if we do an `rm -rf /path/to/container-root`, userspace knows for a fact > that nothing will reference those paths again - all of the processes that could > have been killed. There's no point to having negative dentries for them. > > Maybe something like unlinkat (dfd, path, AT_UNLINKAT_DONTNEED), like > madvise (MAV_DONTNEED) ? No, I think the right fix is to just prune the child dentries when doing an rmdir - and I thought we did that already. IOW, when doing "rmdir()" on a dentry, we should not prune *that* dentry, but we should prune all the dentries (recursively) under it. .. goers to look .. Yeah, look at vfs_rmdir(): it does that shrink_dcache_parent(dentry); already. So when you do a "rm -rf something", you should already have *no* extra negative dentries - there should be exactly one negative dentry remaining (the dentry that used to be the directory itself). So I really don't see what the problem is. If you have lots of dentries left over after a "rm -rf /path/to/container-root", something else is wrong. The only time we have negative dentries is: (a) lookups that didn't match. These are CRITICALLY IMPORTANT. They are very very common. Why? Think of all the PATH-like behavior that Unix traditionally has: not just PATH itself, but look at what processes do with 'strace'. They end up searching for things like translation files for error messages, for shared libraries, for a number of things using path-like things, and it's actually really important that they do *not* keep trying to call down to the filesystem to look for a path that doesn't exist. That's often the most expensive filesystem operation there is - because in a big directory (and again - think about PATH - it often traverses some of the biggest directories around), many filesystems end up walking the whole directory before they say "oops, I didn't find that file". (b) individually removed files. These aren't as important, and we could possibly shrink things, but they really shouldn't matter. How often do you remove a ton of files without removing the directory they are in? Not often. (c) renames etc. These tend to be even less of an issue. So I really think that you shouldn't have that many negative dentries to begin with under normal load, but even if you do, they should be really easy to prune like Jan says. So send out a real load with real numbers. None of this touchy-feely thing that seems to be wrong. Ok? Because maybe we have a bug, and that shrink_dcache_parent() thing doesn't work. That would be interesting and relevant and a bug, so definitely worth fixing. (Side note: the shrink_dcache_parent() thing is actually done before the filesystem rmdir() is called, which means that you can use a "rmdir()" as a MAV_DONTNEED on the dentry tree below that directory. Just make sure the directory isn't empty, so that it doesn't actually get deleted) Linus
> So both you and Waiman complain about negative dentries consuming space > (and I agree, they do) but neither of you has explained why it is a problem. If > memory is ever needed, negative dentries are very easy to reclaim. So to some > extent this is like complaing that page cache consumes your memory - which is > again true but it is a deliberate decision and it helps performance. > > It is possible that some of these dentries are so rarely used that they are > indeed just a waste but then I'd like to see detailed analysis of which negative > dentries are these and how your reclaim heuristics improve the situation. But I > haven't seen any performance numbers from either you or Waiman. So please > gather some performance numbers justifying your change so that we have > something to talk about... > I am sorry, so let me to explain my problem: on my linux machine kernel 4.4, there was one program "foo" doing some backup data and create backup files one by one, which named like foo.1.bak foo.2.bak ... foo.n.bak sequenced. e.g. create foo.2.bak, then delete foo.1.bak, create foo.3.bak then delete foo.2.bak , and one day I found the memory usage was high(total memory was 8G): /proc/meminfo: SReclaimable: 7634344 kB After check the slabinfo: dentry 40001682 40001682 192 21 1 : tunables and the dentry-state: 40001058 39990056 45 0 0 0 The foo has been created about 40 millions of files, and left one file remained, all other Was deleted. At this time, I have one kernel module allocing pages with GFP_ATOMIC flag, some times Allocation failed due to reclaim not very fast(maybe I will modify the parameter of memory Reclaim ...) Another perf issue about keeping large number dentries was, the dentry lookup slowdown The test: I tried to create, close, and remove different files when different number dentries present: (On x86_64 kernel 4.4, 12 cpus Intel(R) Xeon(R) CPU E5645 @ 2.40GHz) When dentries count was 18800 open 1000 files, avg once time 10321.671 ns close 1000 files, avg once time 455.299 ns unlink 1000 files, avg once time 5179.519 ns when dentries count was 40001058 open 1000 files, avg once time 13483.361 ns close 1000 files, avg once time 455.067 ns unlink 1000 files, avg once time 7645.616 ns actually, I can modify the program "foo", and change the backup file's name to be back around like foo.1.bak, foo.2.bak ... foo.100.bak -> foo.1.bak but I am worried, if there are programs create,delete many temporary files and unique, the negative dentries will keep growing. Thanks, Kevin
On Fri, Aug 25, 2017 at 11:56 PM, Wangkai (Kevin,C) <wangkai86@huawei.com> wrote: > > but I am worried, if there are programs create,delete many temporary files and unique, > the negative dentries will keep growing. The thing is, this has nothing to do with unlink. The *easiest* way to generate negative dentries is in fact to never create any files at all: just look up millions of non-existent names. IOW, just something like this #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> int main() { int i; for (i = 0; i < 10000000; i++) { char name[20]; struct stat st; snprintf(name, sizeof(name), "n:%d", i); stat(name, &st); } return 0; } is a much easier and faster way to create negative dentries. And yes, it's entirely possible that we could/should have some way to balance negative dentries against positive ones, but on the whole this has not really come up as a huge problem. For example, your module that does a lot of GFP_ATOMIC allocations - if it wasn't for dentries, it would have been something else. GFP_ATOMIC *will* fail after a while, because it just can't replenish the free memory. That's fundamental. That's what GFP_ATOMIC _means_. It's very much meant for "occasional critical allocations", and if you do just GFP_ATOMIC, you will fail. Linus
On 08/26/2017 12:18 PM, Linus Torvalds wrote: > On Fri, Aug 25, 2017 at 11:56 PM, Wangkai (Kevin,C) > <wangkai86@huawei.com> wrote: >> but I am worried, if there are programs create,delete many temporary files and unique, >> the negative dentries will keep growing. > The thing is, this has nothing to do with unlink. > > The *easiest* way to generate negative dentries is in fact to never > create any files at all: just look up millions of non-existent names. > > IOW, just something like this > > #include <stdio.h> > #include <sys/types.h> > #include <sys/stat.h> > #include <unistd.h> > > int main() > { > int i; > for (i = 0; i < 10000000; i++) { > char name[20]; > struct stat st; > > snprintf(name, sizeof(name), "n:%d", i); > stat(name, &st); > } > return 0; > } > > is a much easier and faster way to create negative dentries. > > And yes, it's entirely possible that we could/should have some way to > balance negative dentries against positive ones, but on the whole this > has not really come up as a huge problem. It is certainly true that the current scheme of unlimited negative dentry creation is not a problem under most cases. However, there are scenarios where it can be a problem. A customer discovered the negative dentry issue because of a bug in their application code. They fixed their code to solve the problem. However, they wondered if this could be used as one vector of a DoS attack on a Linux system by having a rogue program generate massive number of negative dentries continuously. It is the thought of this malicious use of the negative dentry behavior that prompted me to create and send out a patch to limit the number of negative dentries allowable in a system. Besides, Kevin had shown that keeping the dentry cache from growing too big was good for file lookup performance too. Cheers, Longman > For example, your module that does a lot of GFP_ATOMIC allocations - > if it wasn't for dentries, it would have been something else. > GFP_ATOMIC *will* fail after a while, because it just can't replenish > the free memory. That's fundamental. That's what GFP_ATOMIC _means_. > It's very much meant for "occasional critical allocations", and if you > do just GFP_ATOMIC, you will fail. > > Linus
> -----Original Message----- > From: linus971@gmail.com [mailto:linus971@gmail.com] On Behalf Of Linus > Torvalds > Sent: Sunday, August 27, 2017 12:19 AM > To: Wangkai (Kevin,C) > Cc: Jan Kara; viro@zeniv.linux.org.uk; Waiman Long; walters@verbum.org; > linux-fsdevel@vger.kernel.org; Renjinyong (Renjinyong, Business Support Dept) > Subject: Re: [PATCH] fs/dcache: dentries should free after files unlinked or > directories removed > > On Fri, Aug 25, 2017 at 11:56 PM, Wangkai (Kevin,C) > <wangkai86@huawei.com> wrote: > > > > but I am worried, if there are programs create,delete many temporary files > and unique, > > the negative dentries will keep growing. > > The thing is, this has nothing to do with unlink. > > The *easiest* way to generate negative dentries is in fact to never > create any files at all: just look up millions of non-existent names. > > IOW, just something like this > > #include <stdio.h> > #include <sys/types.h> > #include <sys/stat.h> > #include <unistd.h> > > int main() > { > int i; > for (i = 0; i < 10000000; i++) { > char name[20]; > struct stat st; > > snprintf(name, sizeof(name), "n:%d", i); > stat(name, &st); > } > return 0; > } > > is a much easier and faster way to create negative dentries. > > And yes, it's entirely possible that we could/should have some way to > balance negative dentries against positive ones, but on the whole this > has not really come up as a huge problem. > > For example, your module that does a lot of GFP_ATOMIC allocations - > if it wasn't for dentries, it would have been something else. > GFP_ATOMIC *will* fail after a while, because it just can't replenish > the free memory. That's fundamental. That's what GFP_ATOMIC _means_. > It's very much meant for "occasional critical allocations", and if you > do just GFP_ATOMIC, you will fail. > > Linus (1) yes, negative dentries can come up in two ways: i) look up different files ii) create and delete different files (I have met) (2) the module allocation with flag GFP_ATOMIC was called from irq, which do some special packets receiving, called dev_alloc_skb, in this function the memory allocation with flag GFP_ATOMIC. Regards, Kevin
On Sun 27-08-17 11:05:34, Waiman Long wrote: > On 08/26/2017 12:18 PM, Linus Torvalds wrote: > > On Fri, Aug 25, 2017 at 11:56 PM, Wangkai (Kevin,C) > > <wangkai86@huawei.com> wrote: > >> but I am worried, if there are programs create,delete many temporary files and unique, > >> the negative dentries will keep growing. > > The thing is, this has nothing to do with unlink. > > > > The *easiest* way to generate negative dentries is in fact to never > > create any files at all: just look up millions of non-existent names. > > > > IOW, just something like this > > > > #include <stdio.h> > > #include <sys/types.h> > > #include <sys/stat.h> > > #include <unistd.h> > > > > int main() > > { > > int i; > > for (i = 0; i < 10000000; i++) { > > char name[20]; > > struct stat st; > > > > snprintf(name, sizeof(name), "n:%d", i); > > stat(name, &st); > > } > > return 0; > > } > > > > is a much easier and faster way to create negative dentries. > > > > And yes, it's entirely possible that we could/should have some way to > > balance negative dentries against positive ones, but on the whole this > > has not really come up as a huge problem. > > It is certainly true that the current scheme of unlimited negative > dentry creation is not a problem under most cases. However, there are > scenarios where it can be a problem. > > A customer discovered the negative dentry issue because of a bug in > their application code. They fixed their code to solve the problem. > However, they wondered if this could be used as one vector of a DoS > attack on a Linux system by having a rogue program generate massive > number of negative dentries continuously. It is the thought of this > malicious use of the negative dentry behavior that prompted me to create > and send out a patch to limit the number of negative dentries allowable > in a system. Well, and how is this fundamentally different from a user consuming resources by other means (positive dentries, inode cache, page cache, anon memory etc.)? Sure you can force slab reclaim to work hard but you have many other ways how a local user can do that. So if you can demonstrate that it is too easy to DoS a system in some way, we can talk about mitigating the attack. But just the ability of making the system busy does not seem serious to me. > Besides, Kevin had shown that keeping the dentry cache from growing too > big was good for file lookup performance too. Well, that rather speaks for better data structure for dentry lookup (e.g. growing hash tables) rather than for limiting negative dentries? I can imagine there are workloads which would benefit from that as well? Honza
On 08/31/2017 03:53 AM, Jan Kara wrote: > On Sun 27-08-17 11:05:34, Waiman Long wrote: >> >> It is certainly true that the current scheme of unlimited negative >> dentry creation is not a problem under most cases. However, there are >> scenarios where it can be a problem. >> >> A customer discovered the negative dentry issue because of a bug in >> their application code. They fixed their code to solve the problem. >> However, they wondered if this could be used as one vector of a DoS >> attack on a Linux system by having a rogue program generate massive >> number of negative dentries continuously. It is the thought of this >> malicious use of the negative dentry behavior that prompted me to create >> and send out a patch to limit the number of negative dentries allowable >> in a system. > Well, and how is this fundamentally different from a user consuming > resources by other means (positive dentries, inode cache, page cache, anon > memory etc.)? Sure you can force slab reclaim to work hard but you have > many other ways how a local user can do that. So if you can demonstrate > that it is too easy to DoS a system in some way, we can talk about > mitigating the attack. But just the ability of making the system busy does > not seem serious to me. Positive dentries are limited by the total number of files in the file system. Negative dentries, on the other hand, have no such limit. There are ways to limit other resource usages such as limiting memory usage of a user by memory cgroup, filesystem quota for amount of disk space or number of files created or owned, etc. However, I am not aware of any control mechanism that can limit the number negative dentries generated by a given user. That makes negative dentries somewhat different from the other resource types that you are talking about. >> Besides, Kevin had shown that keeping the dentry cache from growing too >> big was good for file lookup performance too. > Well, that rather speaks for better data structure for dentry lookup (e.g. > growing hash tables) rather than for limiting negative dentries? I can > imagine there are workloads which would benefit from that as well? Current dentry lookup is through a hash table. The lookup performance will depend on the number of hashed slots as well as the number of entries queued in each slot. So in general, lookup performance deteriorates the more entries you put into a given slot. That is true no matter how many slots you have allocated. Cheers, Longman
On Thu 31-08-17 12:27:27, Waiman Long wrote: > On 08/31/2017 03:53 AM, Jan Kara wrote: > > On Sun 27-08-17 11:05:34, Waiman Long wrote: > >> > >> It is certainly true that the current scheme of unlimited negative > >> dentry creation is not a problem under most cases. However, there are > >> scenarios where it can be a problem. > >> > >> A customer discovered the negative dentry issue because of a bug in > >> their application code. They fixed their code to solve the problem. > >> However, they wondered if this could be used as one vector of a DoS > >> attack on a Linux system by having a rogue program generate massive > >> number of negative dentries continuously. It is the thought of this > >> malicious use of the negative dentry behavior that prompted me to create > >> and send out a patch to limit the number of negative dentries allowable > >> in a system. > > Well, and how is this fundamentally different from a user consuming > > resources by other means (positive dentries, inode cache, page cache, anon > > memory etc.)? Sure you can force slab reclaim to work hard but you have > > many other ways how a local user can do that. So if you can demonstrate > > that it is too easy to DoS a system in some way, we can talk about > > mitigating the attack. But just the ability of making the system busy does > > not seem serious to me. > > Positive dentries are limited by the total number of files in the file > system. Negative dentries, on the other hand, have no such limit. There > are ways to limit other resource usages such as limiting memory usage of > a user by memory cgroup, filesystem quota for amount of disk space or > number of files created or owned, etc. However, I am not aware of any > control mechanism that can limit the number negative dentries generated > by a given user. That makes negative dentries somewhat different from > the other resource types that you are talking about. So I agree they are somewhat different but not fundamentally different - e.g. total number of files in the file system can be easily so high that dentries + inodes cannot fit into RAM and thus you are in a very similar situation as with negative dentries. That's actually one of the reasons why people were trying to bend memcgs to account slab cache as well. But it didn't end anywhere AFAIK. The reason why I'm objecting is that the limit on the number of negative dentries is another tuning knob, it is for very specific cases, and most of sysadmins will have no clue how to set it properly (even I wouldn't have a good idea). > >> Besides, Kevin had shown that keeping the dentry cache from growing too > >> big was good for file lookup performance too. > > Well, that rather speaks for better data structure for dentry lookup (e.g. > > growing hash tables) rather than for limiting negative dentries? I can > > imagine there are workloads which would benefit from that as well? > > Current dentry lookup is through a hash table. The lookup performance > will depend on the number of hashed slots as well as the number of > entries queued in each slot. So in general, lookup performance > deteriorates the more entries you put into a given slot. That is true no > matter how many slots you have allocated. Agreed, but with rhashtables the number of slots grows dynamically with the number of entries... Honza
On Mon, Sep 4, 2017 at 5:59 AM, Jan Kara <jack@suse.cz> wrote: > > The reason why I'm objecting is that the limit on the number of negative > dentries is another tuning knob, it is for very specific cases, and most of > sysadmins will have no clue how to set it properly (even I wouldn't have a > good idea). We could make some cut at just doing it without any explicit tuning. In particular, we could probably have some fairly trivial logic to count negative dentries vs positive ones, and start pruning negative dentries if we've crossed some threshold and we have more of them than of the positive kind. That's a fairly obvious "somebody is doing something bad" measure which wouldn't need a ton of tuning. That said, I'm not convinced it's a huge deal and worth it - I'm just saying that I don't think it's necessarily crazy or something I'd NAK automatically. Linus
On 09/04/2017 08:59 AM, Jan Kara wrote: > > So I agree they are somewhat different but not fundamentally different - > e.g. total number of files in the file system can be easily so high that > dentries + inodes cannot fit into RAM and thus you are in a very similar > situation as with negative dentries. That's actually one of the reasons why > people were trying to bend memcgs to account slab cache as well. But it > didn't end anywhere AFAIK. > > The reason why I'm objecting is that the limit on the number of negative > dentries is another tuning knob, it is for very specific cases, and most of > sysadmins will have no clue how to set it properly (even I wouldn't have a > good idea). Thanks for letting me know which part of the patch you are objecting to. As suggested by Linus, I can easily change the patch to do some kind of auto-tuning depending on the positive-negative dentries ratio without needing a user configurable kernel command line option. I added that option to make the patch more flexible, but I do agree that most people will likely leave it at the default value without ever using it. >> Current dentry lookup is through a hash table. The lookup performance >> will depend on the number of hashed slots as well as the number of >> entries queued in each slot. So in general, lookup performance >> deteriorates the more entries you put into a given slot. That is true no >> matter how many slots you have allocated. > Agreed, but with rhashtables the number of slots grows dynamically with the > number of entries... Currently, alloc_large_system_hash() is scaling the number of hash slots according to the system memory size. That is adequate in most cases. Using rhashtable will add a little bit of overhead into the hash index computation, so we will probably see a little bit of slowdown with small number of dentries and a bit of speed-up with large number of dentries. That may not be a trade-off we would like to take. Cheers, Longman
diff --git a/fs/dcache.c b/fs/dcache.c old mode 100644 new mode 100755 index f901413..6828463 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -722,7 +722,8 @@ static inline bool fast_dput(struct dentry *dentry) */ smp_rmb(); d_flags = ACCESS_ONCE(dentry->d_flags); - d_flags &= DCACHE_REFERENCED | DCACHE_LRU_LIST | DCACHE_DISCONNECTED; + d_flags &= DCACHE_REFERENCED | DCACHE_LRU_LIST | DCACHE_DISCONNECTED + | DCACHE_FILE_REMOVED; /* Nothing to do? Dropping the reference was all we needed? */ if (d_flags == (DCACHE_REFERENCED | DCACHE_LRU_LIST) && !d_unhashed(dentry)) @@ -816,6 +817,15 @@ void dput(struct dentry *dentry) dentry_lru_add(dentry); dentry->d_lockref.count--; + + /* + * if file has been declare as removed and reference count is zero + * then we can free the dentry rather than leave it stay in dcache + */ + if (unlikely(dentry->d_flags & DCACHE_FILE_REMOVED)) { + if (dentry->d_lockref.count == 0) + goto kill_it; + } spin_unlock(&dentry->d_lock); return; diff --git a/fs/namei.c b/fs/namei.c old mode 100644 new mode 100755 index ddb6a7c..0ec1478 --- a/fs/namei.c +++ b/fs/namei.c @@ -3918,6 +3918,9 @@ static long do_rmdir(int dfd, const char __user *pathname) goto exit3; error = vfs_rmdir(path.dentry->d_inode, dentry); exit3: + /* after remove the dir set dentry remove flag */ + if (!error) + dentry->d_flags |= DCACHE_FILE_REMOVED; dput(dentry); exit2: inode_unlock(path.dentry->d_inode); @@ -4042,6 +4045,9 @@ static long do_unlinkat(int dfd, const char __user *pathname) goto exit2; error = vfs_unlink(path.dentry->d_inode, dentry, &delegated_inode); exit2: + /* after unlink file set dentry remove flag */ + if (!error) + dentry->d_flags |= DCACHE_FILE_REMOVED; dput(dentry); } inode_unlock(path.dentry->d_inode); diff --git a/include/linux/dcache.h b/include/linux/dcache.h old mode 100644 new mode 100755 index aae1cdb..2d65bd6 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -215,7 +215,7 @@ struct dentry_operations { #define DCACHE_FALLTHRU 0x01000000 /* Fall through to lower layer */ #define DCACHE_ENCRYPTED_WITH_KEY 0x02000000 /* dir is encrypted with a valid key */ #define DCACHE_OP_REAL 0x04000000 - +#define DCACHE_FILE_REMOVED 0x08000000 /* file or dir has been unlinked or removed */ #define DCACHE_PAR_LOOKUP 0x10000000 /* being looked up (with parent locked shared) */ #define DCACHE_DENTRY_CURSOR 0x20000000
sometimes, on my server, there were lots of files creating and removing, and I found that memory usage were high, and I checked, almost all the memory was occupied by slab recliamable, the slab struct was dentry, and I checked the code of dcache, and found that when a file was deleted the dentry never free, unless a memory recliam was triggerd. I made this patch to mark the dentry as a remove state after file unlinked or directory removed, and when the dentry’s reference count dec to zero and free it, and it worked well on my server base on kernel 4.4. Signed-off-by: Wangkai <wangkai86@huawei.com> --- fs/dcache.c | 12 +++++++++++- fs/namei.c | 6 ++++++ include/linux/dcache.h | 2 +- 3 files changed, 18 insertions(+), 2 deletions(-) mode change 100644 => 100755 fs/dcache.c mode change 100644 => 100755 fs/namei.c mode change 100644 => 100755 include/linux/dcache.h