diff mbox series

vfs: don't mod negative dentry count when on shrinker list

Message ID 20240702170757.232130-1-bfoster@redhat.com (mailing list archive)
State New
Headers show
Series vfs: don't mod negative dentry count when on shrinker list | expand

Commit Message

Brian Foster July 2, 2024, 5:07 p.m. UTC
The nr_dentry_negative counter is intended to only account negative
dentries that are present on the superblock LRU. Therefore, the LRU
add, remove and isolate helpers modify the counter based on whether
the dentry is negative, but the shrinker list related helpers do not
modify the counter, and the paths that change a dentry between
positive and negative only do so if DCACHE_LRU_LIST is set.

The problem with this is that a dentry on a shrinker list still has
DCACHE_LRU_LIST set to indicate ->d_lru is in use. The additional
DCACHE_SHRINK_LIST flag denotes whether the dentry is on LRU or a
shrink related list. Therefore if a relevant operation (i.e. unlink)
occurs while a dentry is present on a shrinker list, and the
associated codepath only checks for DCACHE_LRU_LIST, then it is
technically possible to modify the negative dentry count for a
dentry that is off the LRU. Since the shrinker list related helpers
do not modify the negative dentry count (because non-LRU dentries
should not be included in the count) when the dentry is ultimately
removed from the shrinker list, this can cause the negative dentry
count to become permanently inaccurate.

This problem can be reproduced via a heavy file create/unlink vs.
drop_caches workload. On an 80xcpu system, I start 80 tasks each
running a 1k file create/delete loop, and one task spinning on
drop_caches. After 10 minutes or so of runtime, the idle/clean cache
negative dentry count increases from somewhere in the range of 5-10
entries to several hundred (and increasingly grows beyond
nr_dentry_unused).

Tweak the logic in the paths that turn a dentry negative or positive
to filter out the case where the dentry is present on a shrink
related list. This allows the above workload to maintain an accurate
negative dentry count.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/dcache.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Josef Bacik July 2, 2024, 5:31 p.m. UTC | #1
On Tue, Jul 02, 2024 at 01:07:57PM -0400, Brian Foster wrote:
> The nr_dentry_negative counter is intended to only account negative
> dentries that are present on the superblock LRU. Therefore, the LRU
> add, remove and isolate helpers modify the counter based on whether
> the dentry is negative, but the shrinker list related helpers do not
> modify the counter, and the paths that change a dentry between
> positive and negative only do so if DCACHE_LRU_LIST is set.
> 
> The problem with this is that a dentry on a shrinker list still has
> DCACHE_LRU_LIST set to indicate ->d_lru is in use. The additional
> DCACHE_SHRINK_LIST flag denotes whether the dentry is on LRU or a
> shrink related list. Therefore if a relevant operation (i.e. unlink)
> occurs while a dentry is present on a shrinker list, and the
> associated codepath only checks for DCACHE_LRU_LIST, then it is
> technically possible to modify the negative dentry count for a
> dentry that is off the LRU. Since the shrinker list related helpers
> do not modify the negative dentry count (because non-LRU dentries
> should not be included in the count) when the dentry is ultimately
> removed from the shrinker list, this can cause the negative dentry
> count to become permanently inaccurate.
> 
> This problem can be reproduced via a heavy file create/unlink vs.
> drop_caches workload. On an 80xcpu system, I start 80 tasks each
> running a 1k file create/delete loop, and one task spinning on
> drop_caches. After 10 minutes or so of runtime, the idle/clean cache
> negative dentry count increases from somewhere in the range of 5-10
> entries to several hundred (and increasingly grows beyond
> nr_dentry_unused).
> 
> Tweak the logic in the paths that turn a dentry negative or positive
> to filter out the case where the dentry is present on a shrink
> related list. This allows the above workload to maintain an accurate
> negative dentry count.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

This is sort of a subtle interaction, it took me a bit to piece it together,
could you add a comment to the sections indicating the purpose of the extra
check?  Thanks,

Josef
Brian Foster July 2, 2024, 5:55 p.m. UTC | #2
On Tue, Jul 02, 2024 at 01:31:06PM -0400, Josef Bacik wrote:
> On Tue, Jul 02, 2024 at 01:07:57PM -0400, Brian Foster wrote:
> > The nr_dentry_negative counter is intended to only account negative
> > dentries that are present on the superblock LRU. Therefore, the LRU
> > add, remove and isolate helpers modify the counter based on whether
> > the dentry is negative, but the shrinker list related helpers do not
> > modify the counter, and the paths that change a dentry between
> > positive and negative only do so if DCACHE_LRU_LIST is set.
> > 
> > The problem with this is that a dentry on a shrinker list still has
> > DCACHE_LRU_LIST set to indicate ->d_lru is in use. The additional
> > DCACHE_SHRINK_LIST flag denotes whether the dentry is on LRU or a
> > shrink related list. Therefore if a relevant operation (i.e. unlink)
> > occurs while a dentry is present on a shrinker list, and the
> > associated codepath only checks for DCACHE_LRU_LIST, then it is
> > technically possible to modify the negative dentry count for a
> > dentry that is off the LRU. Since the shrinker list related helpers
> > do not modify the negative dentry count (because non-LRU dentries
> > should not be included in the count) when the dentry is ultimately
> > removed from the shrinker list, this can cause the negative dentry
> > count to become permanently inaccurate.
> > 
> > This problem can be reproduced via a heavy file create/unlink vs.
> > drop_caches workload. On an 80xcpu system, I start 80 tasks each
> > running a 1k file create/delete loop, and one task spinning on
> > drop_caches. After 10 minutes or so of runtime, the idle/clean cache
> > negative dentry count increases from somewhere in the range of 5-10
> > entries to several hundred (and increasingly grows beyond
> > nr_dentry_unused).
> > 
> > Tweak the logic in the paths that turn a dentry negative or positive
> > to filter out the case where the dentry is present on a shrink
> > related list. This allows the above workload to maintain an accurate
> > negative dentry count.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> 
> This is sort of a subtle interaction, it took me a bit to piece it together,
> could you add a comment to the sections indicating the purpose of the extra
> check?  Thanks,
> 

Sure.. I briefly considered whether something like a d_is_lru() or some
such helper might be more useful and/or descriptive, but I didn't think
too hard on it. That would also leave one place to add a comment instead
of two, but that's not a big deal either.

I'll wait a bit for any additional feedback and send a v2 with that
tweak. Thanks for the review.

Brian

> Josef
>
Ian Kent July 3, 2024, 1:48 a.m. UTC | #3
On 3/7/24 01:07, Brian Foster wrote:
> The nr_dentry_negative counter is intended to only account negative
> dentries that are present on the superblock LRU. Therefore, the LRU
> add, remove and isolate helpers modify the counter based on whether
> the dentry is negative, but the shrinker list related helpers do not
> modify the counter, and the paths that change a dentry between
> positive and negative only do so if DCACHE_LRU_LIST is set.
>
> The problem with this is that a dentry on a shrinker list still has
> DCACHE_LRU_LIST set to indicate ->d_lru is in use. The additional
> DCACHE_SHRINK_LIST flag denotes whether the dentry is on LRU or a
> shrink related list. Therefore if a relevant operation (i.e. unlink)
> occurs while a dentry is present on a shrinker list, and the
> associated codepath only checks for DCACHE_LRU_LIST, then it is
> technically possible to modify the negative dentry count for a
> dentry that is off the LRU. Since the shrinker list related helpers
> do not modify the negative dentry count (because non-LRU dentries
> should not be included in the count) when the dentry is ultimately
> removed from the shrinker list, this can cause the negative dentry
> count to become permanently inaccurate.
>
> This problem can be reproduced via a heavy file create/unlink vs.
> drop_caches workload. On an 80xcpu system, I start 80 tasks each
> running a 1k file create/delete loop, and one task spinning on
> drop_caches. After 10 minutes or so of runtime, the idle/clean cache
> negative dentry count increases from somewhere in the range of 5-10
> entries to several hundred (and increasingly grows beyond
> nr_dentry_unused).
>
> Tweak the logic in the paths that turn a dentry negative or positive
> to filter out the case where the dentry is present on a shrink
> related list. This allows the above workload to maintain an accurate
> negative dentry count.
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>   fs/dcache.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 407095188f83..5305b95b3030 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -355,7 +355,7 @@ static inline void __d_clear_type_and_inode(struct dentry *dentry)
>   	flags &= ~DCACHE_ENTRY_TYPE;
>   	WRITE_ONCE(dentry->d_flags, flags);
>   	dentry->d_inode = NULL;
> -	if (flags & DCACHE_LRU_LIST)
> +	if ((flags & (DCACHE_LRU_LIST|DCACHE_SHRINK_LIST)) == DCACHE_LRU_LIST)
>   		this_cpu_inc(nr_dentry_negative);
>   }
>   
> @@ -1846,7 +1846,8 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode)
>   	/*
>   	 * Decrement negative dentry count if it was in the LRU list.
>   	 */
> -	if (dentry->d_flags & DCACHE_LRU_LIST)
> +	if ((dentry->d_flags &
> +	     (DCACHE_LRU_LIST|DCACHE_SHRINK_LIST)) == DCACHE_LRU_LIST)
>   		this_cpu_dec(nr_dentry_negative);
>   	hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
>   	raw_write_seqcount_begin(&dentry->d_seq);


Acked-by: Ian Kent <ikent@redhat.com>


Christian, just thought I'd call your attention to this since it's a bit 
urgent for us to get reviews

and hopefully merged into the VFS tree.


Ian
Christian Brauner July 3, 2024, 8:22 a.m. UTC | #4
On Wed, Jul 03, 2024 at 09:48:52AM GMT, Ian Kent wrote:
> On 3/7/24 01:07, Brian Foster wrote:
> > The nr_dentry_negative counter is intended to only account negative
> > dentries that are present on the superblock LRU. Therefore, the LRU
> > add, remove and isolate helpers modify the counter based on whether
> > the dentry is negative, but the shrinker list related helpers do not
> > modify the counter, and the paths that change a dentry between
> > positive and negative only do so if DCACHE_LRU_LIST is set.
> > 
> > The problem with this is that a dentry on a shrinker list still has
> > DCACHE_LRU_LIST set to indicate ->d_lru is in use. The additional
> > DCACHE_SHRINK_LIST flag denotes whether the dentry is on LRU or a
> > shrink related list. Therefore if a relevant operation (i.e. unlink)
> > occurs while a dentry is present on a shrinker list, and the
> > associated codepath only checks for DCACHE_LRU_LIST, then it is
> > technically possible to modify the negative dentry count for a
> > dentry that is off the LRU. Since the shrinker list related helpers
> > do not modify the negative dentry count (because non-LRU dentries
> > should not be included in the count) when the dentry is ultimately
> > removed from the shrinker list, this can cause the negative dentry
> > count to become permanently inaccurate.
> > 
> > This problem can be reproduced via a heavy file create/unlink vs.
> > drop_caches workload. On an 80xcpu system, I start 80 tasks each
> > running a 1k file create/delete loop, and one task spinning on
> > drop_caches. After 10 minutes or so of runtime, the idle/clean cache
> > negative dentry count increases from somewhere in the range of 5-10
> > entries to several hundred (and increasingly grows beyond
> > nr_dentry_unused).
> > 
> > Tweak the logic in the paths that turn a dentry negative or positive
> > to filter out the case where the dentry is present on a shrink
> > related list. This allows the above workload to maintain an accurate
> > negative dentry count.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >   fs/dcache.c | 5 +++--
> >   1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/dcache.c b/fs/dcache.c
> > index 407095188f83..5305b95b3030 100644
> > --- a/fs/dcache.c
> > +++ b/fs/dcache.c
> > @@ -355,7 +355,7 @@ static inline void __d_clear_type_and_inode(struct dentry *dentry)
> >   	flags &= ~DCACHE_ENTRY_TYPE;
> >   	WRITE_ONCE(dentry->d_flags, flags);
> >   	dentry->d_inode = NULL;
> > -	if (flags & DCACHE_LRU_LIST)
> > +	if ((flags & (DCACHE_LRU_LIST|DCACHE_SHRINK_LIST)) == DCACHE_LRU_LIST)
> >   		this_cpu_inc(nr_dentry_negative);
> >   }
> > @@ -1846,7 +1846,8 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode)
> >   	/*
> >   	 * Decrement negative dentry count if it was in the LRU list.
> >   	 */
> > -	if (dentry->d_flags & DCACHE_LRU_LIST)
> > +	if ((dentry->d_flags &
> > +	     (DCACHE_LRU_LIST|DCACHE_SHRINK_LIST)) == DCACHE_LRU_LIST)
> >   		this_cpu_dec(nr_dentry_negative);
> >   	hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
> >   	raw_write_seqcount_begin(&dentry->d_seq);
> 
> 
> Acked-by: Ian Kent <ikent@redhat.com>
> 
> 
> Christian, just thought I'd call your attention to this since it's a bit
> urgent for us to get reviews
> 
> and hopefully merged into the VFS tree.

I'm about to pick it up.
Christian Brauner July 3, 2024, 8:22 a.m. UTC | #5
On Tue, 02 Jul 2024 13:07:57 -0400, Brian Foster wrote:
> The nr_dentry_negative counter is intended to only account negative
> dentries that are present on the superblock LRU. Therefore, the LRU
> add, remove and isolate helpers modify the counter based on whether
> the dentry is negative, but the shrinker list related helpers do not
> modify the counter, and the paths that change a dentry between
> positive and negative only do so if DCACHE_LRU_LIST is set.
> 
> [...]

Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes

[1/1] vfs: don't mod negative dentry count when on shrinker list
      https://git.kernel.org/vfs/vfs/c/e161afd05b24
Ian Kent July 4, 2024, 1:04 a.m. UTC | #6
On 3/7/24 16:22, Christian Brauner wrote:
> On Wed, Jul 03, 2024 at 09:48:52AM GMT, Ian Kent wrote:
>> On 3/7/24 01:07, Brian Foster wrote:
>>> The nr_dentry_negative counter is intended to only account negative
>>> dentries that are present on the superblock LRU. Therefore, the LRU
>>> add, remove and isolate helpers modify the counter based on whether
>>> the dentry is negative, but the shrinker list related helpers do not
>>> modify the counter, and the paths that change a dentry between
>>> positive and negative only do so if DCACHE_LRU_LIST is set.
>>>
>>> The problem with this is that a dentry on a shrinker list still has
>>> DCACHE_LRU_LIST set to indicate ->d_lru is in use. The additional
>>> DCACHE_SHRINK_LIST flag denotes whether the dentry is on LRU or a
>>> shrink related list. Therefore if a relevant operation (i.e. unlink)
>>> occurs while a dentry is present on a shrinker list, and the
>>> associated codepath only checks for DCACHE_LRU_LIST, then it is
>>> technically possible to modify the negative dentry count for a
>>> dentry that is off the LRU. Since the shrinker list related helpers
>>> do not modify the negative dentry count (because non-LRU dentries
>>> should not be included in the count) when the dentry is ultimately
>>> removed from the shrinker list, this can cause the negative dentry
>>> count to become permanently inaccurate.
>>>
>>> This problem can be reproduced via a heavy file create/unlink vs.
>>> drop_caches workload. On an 80xcpu system, I start 80 tasks each
>>> running a 1k file create/delete loop, and one task spinning on
>>> drop_caches. After 10 minutes or so of runtime, the idle/clean cache
>>> negative dentry count increases from somewhere in the range of 5-10
>>> entries to several hundred (and increasingly grows beyond
>>> nr_dentry_unused).
>>>
>>> Tweak the logic in the paths that turn a dentry negative or positive
>>> to filter out the case where the dentry is present on a shrink
>>> related list. This allows the above workload to maintain an accurate
>>> negative dentry count.
>>>
>>> Signed-off-by: Brian Foster <bfoster@redhat.com>
>>> ---
>>>    fs/dcache.c | 5 +++--
>>>    1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/dcache.c b/fs/dcache.c
>>> index 407095188f83..5305b95b3030 100644
>>> --- a/fs/dcache.c
>>> +++ b/fs/dcache.c
>>> @@ -355,7 +355,7 @@ static inline void __d_clear_type_and_inode(struct dentry *dentry)
>>>    	flags &= ~DCACHE_ENTRY_TYPE;
>>>    	WRITE_ONCE(dentry->d_flags, flags);
>>>    	dentry->d_inode = NULL;
>>> -	if (flags & DCACHE_LRU_LIST)
>>> +	if ((flags & (DCACHE_LRU_LIST|DCACHE_SHRINK_LIST)) == DCACHE_LRU_LIST)
>>>    		this_cpu_inc(nr_dentry_negative);
>>>    }
>>> @@ -1846,7 +1846,8 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode)
>>>    	/*
>>>    	 * Decrement negative dentry count if it was in the LRU list.
>>>    	 */
>>> -	if (dentry->d_flags & DCACHE_LRU_LIST)
>>> +	if ((dentry->d_flags &
>>> +	     (DCACHE_LRU_LIST|DCACHE_SHRINK_LIST)) == DCACHE_LRU_LIST)
>>>    		this_cpu_dec(nr_dentry_negative);
>>>    	hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
>>>    	raw_write_seqcount_begin(&dentry->d_seq);
>>
>> Acked-by: Ian Kent <ikent@redhat.com>
>>
>>
>> Christian, just thought I'd call your attention to this since it's a bit
>> urgent for us to get reviews
>>
>> and hopefully merged into the VFS tree.
> I'm about to pick it up.

Thanks much, sounds like there's a v2 on the way (which could have done 
by now).


Ian
diff mbox series

Patch

diff --git a/fs/dcache.c b/fs/dcache.c
index 407095188f83..5305b95b3030 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -355,7 +355,7 @@  static inline void __d_clear_type_and_inode(struct dentry *dentry)
 	flags &= ~DCACHE_ENTRY_TYPE;
 	WRITE_ONCE(dentry->d_flags, flags);
 	dentry->d_inode = NULL;
-	if (flags & DCACHE_LRU_LIST)
+	if ((flags & (DCACHE_LRU_LIST|DCACHE_SHRINK_LIST)) == DCACHE_LRU_LIST)
 		this_cpu_inc(nr_dentry_negative);
 }
 
@@ -1846,7 +1846,8 @@  static void __d_instantiate(struct dentry *dentry, struct inode *inode)
 	/*
 	 * Decrement negative dentry count if it was in the LRU list.
 	 */
-	if (dentry->d_flags & DCACHE_LRU_LIST)
+	if ((dentry->d_flags &
+	     (DCACHE_LRU_LIST|DCACHE_SHRINK_LIST)) == DCACHE_LRU_LIST)
 		this_cpu_dec(nr_dentry_negative);
 	hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
 	raw_write_seqcount_begin(&dentry->d_seq);