diff mbox series

[RFC,8/8] dcache: prevent flooding with negative dentries

Message ID 158894061332.200862.9812452563558764287.stgit@buzz (mailing list archive)
State New, archived
Headers show
Series dcache: increase poison resistance | expand

Commit Message

Konstantin Khlebnikov May 8, 2020, 12:23 p.m. UTC
Without memory pressure count of negative dentries isn't bounded.
They could consume all memory and drain all other inactive caches.

Typical scenario is an idle system where some process periodically creates
temporary files and removes them. After some time, memory will be filled
with negative dentries for these random file names. Reclaiming them took
some time because slab frees pages only when all related objects are gone.
Time of dentry lookup is usually unaffected because hash table grows along
with size of memory. Unless somebody especially crafts hash collisions.
Simple lookup of random names also generates negative dentries very fast.

This patch implements heuristic which detects such scenarios and prevents
unbounded growth of completely unneeded negative dentries. It keeps up to
three latest negative dentry in each bucket unless they were referenced.

At first dput of negative dentry when it swept to the tail of siblings
we'll also clear it's reference flag and look at next dentries in chain.
Then kill third in series of negative, unused and unreferenced denries.

This way each hash bucket will preserve three negative dentry to let them
get reference and survive. Adding positive or used dentry into hash chain
also protects few recent negative dentries. In result total size of dcache
asymptotically limited by count of buckets and positive or used dentries.

Before patch: tool 'dcache_stress' could fill entire memory with dentries.

nr_dentry = 104913261   104.9M
nr_buckets = 8388608    12.5 avg
nr_unused = 104898729   100.0%
nr_negative = 104883218 100.0%

After this patch count of dentries saturates at around 3 per bucket:

nr_dentry = 24619259    24.6M
nr_buckets = 8388608    2.9 avg
nr_unused = 24605226    99.9%
nr_negative = 24600351  99.9%

This heuristic isn't bulletproof and solves only most practical case.
It's easy to deceive: just touch same random name twice.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 fs/dcache.c |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

Comments

Matthew Wilcox (Oracle) May 8, 2020, 2:56 p.m. UTC | #1
On Fri, May 08, 2020 at 03:23:33PM +0300, Konstantin Khlebnikov wrote:
> This patch implements heuristic which detects such scenarios and prevents
> unbounded growth of completely unneeded negative dentries. It keeps up to
> three latest negative dentry in each bucket unless they were referenced.
> 
> At first dput of negative dentry when it swept to the tail of siblings
> we'll also clear it's reference flag and look at next dentries in chain.
> Then kill third in series of negative, unused and unreferenced denries.
> 
> This way each hash bucket will preserve three negative dentry to let them
> get reference and survive. Adding positive or used dentry into hash chain
> also protects few recent negative dentries. In result total size of dcache
> asymptotically limited by count of buckets and positive or used dentries.
> 
> This heuristic isn't bulletproof and solves only most practical case.
> It's easy to deceive: just touch same random name twice.

I'm not sure if that's "easy to deceive" ... My concern with limiting
negative dentries is something like a kernel compilation where there
are many (11 for mm/mmap.c, 9 in general) and there will be a lot of
places where <linux/fs.h> does not exist

-isystem /usr/lib/gcc/x86_64-linux-gnu/9/include
-I../arch/x86/include
-I./arch/x86/include/generated
-I../include
-I./include
-I../arch/x86/include/uapi
-I./arch/x86/include/generated/uapi
-I../include/uapi
-I./include/generated/uapi
-I ../mm
-I ./mm

So it'd be good to know that kernel compilation times are unaffected by
this patch.
Konstantin Khlebnikov May 8, 2020, 4:29 p.m. UTC | #2
On 08/05/2020 17.56, Matthew Wilcox wrote:
> On Fri, May 08, 2020 at 03:23:33PM +0300, Konstantin Khlebnikov wrote:
>> This patch implements heuristic which detects such scenarios and prevents
>> unbounded growth of completely unneeded negative dentries. It keeps up to
>> three latest negative dentry in each bucket unless they were referenced.
>>
>> At first dput of negative dentry when it swept to the tail of siblings
>> we'll also clear it's reference flag and look at next dentries in chain.
>> Then kill third in series of negative, unused and unreferenced denries.
>>
>> This way each hash bucket will preserve three negative dentry to let them
>> get reference and survive. Adding positive or used dentry into hash chain
>> also protects few recent negative dentries. In result total size of dcache
>> asymptotically limited by count of buckets and positive or used dentries.
>>
>> This heuristic isn't bulletproof and solves only most practical case.
>> It's easy to deceive: just touch same random name twice.
> 
> I'm not sure if that's "easy to deceive" ... My concern with limiting
> negative dentries is something like a kernel compilation where there
> are many (11 for mm/mmap.c, 9 in general) and there will be a lot of
> places where <linux/fs.h> does not exist
> 
> -isystem /usr/lib/gcc/x86_64-linux-gnu/9/include
> -I../arch/x86/include
> -I./arch/x86/include/generated
> -I../include
> -I./include
> -I../arch/x86/include/uapi
> -I./arch/x86/include/generated/uapi
> -I../include/uapi
> -I./include/generated/uapi
> -I ../mm
> -I ./mm
> 
> So it'd be good to know that kernel compilation times are unaffected by
> this patch.
> 

It's very unlikely that this patches changes anything for compilation.
Or any other scenario with sane amount and rate of appearing new names.

This trims only dentries which never been accessed twice.
Keeping 3 dentries per bucket gives high chances that all of them get
reference bit and stay in cache until shrinker bury them.

To get false positive in this heuristic - multiple newly created
negative dentries must hit one bucket in short period of time.
I.e. at least three hash collisions is required.
Waiman Long May 8, 2020, 9:07 p.m. UTC | #3
On 5/8/20 8:23 AM, Konstantin Khlebnikov wrote:
> Without memory pressure count of negative dentries isn't bounded.
> They could consume all memory and drain all other inactive caches.
>
> Typical scenario is an idle system where some process periodically creates
> temporary files and removes them. After some time, memory will be filled
> with negative dentries for these random file names. Reclaiming them took
> some time because slab frees pages only when all related objects are gone.
> Time of dentry lookup is usually unaffected because hash table grows along
> with size of memory. Unless somebody especially crafts hash collisions.
> Simple lookup of random names also generates negative dentries very fast.
>
> This patch implements heuristic which detects such scenarios and prevents
> unbounded growth of completely unneeded negative dentries. It keeps up to
> three latest negative dentry in each bucket unless they were referenced.
>
> At first dput of negative dentry when it swept to the tail of siblings
> we'll also clear it's reference flag and look at next dentries in chain.
> Then kill third in series of negative, unused and unreferenced denries.
>
> This way each hash bucket will preserve three negative dentry to let them
> get reference and survive. Adding positive or used dentry into hash chain
> also protects few recent negative dentries. In result total size of dcache
> asymptotically limited by count of buckets and positive or used dentries.
>
> Before patch: tool 'dcache_stress' could fill entire memory with dentries.
>
> nr_dentry = 104913261   104.9M
> nr_buckets = 8388608    12.5 avg
> nr_unused = 104898729   100.0%
> nr_negative = 104883218 100.0%
>
> After this patch count of dentries saturates at around 3 per bucket:
>
> nr_dentry = 24619259    24.6M
> nr_buckets = 8388608    2.9 avg
> nr_unused = 24605226    99.9%
> nr_negative = 24600351  99.9%
>
> This heuristic isn't bulletproof and solves only most practical case.
> It's easy to deceive: just touch same random name twice.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> ---
>   fs/dcache.c |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 54 insertions(+)
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 60158065891e..9f3d331b4978 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -632,6 +632,58 @@ static inline struct dentry *lock_parent(struct dentry *dentry)
>   	return __lock_parent(dentry);
>   }
>   
> +/*
> + * Called at first dput of each negative dentry.
> + * Prevents filling cache with never reused negative dentries.
> + *
> + * This clears reference and then looks at following dentries in hash chain.
> + * If they are negative, unused and unreferenced then keep two and kill third.
> + */
> +static void trim_negative(struct dentry *dentry)
> +	__releases(dentry->d_lock)
> +{
> +	struct dentry *victim, *parent;
> +	struct hlist_bl_node *next;
> +	int keep = 2;
> +
> +	rcu_read_lock();
> +
> +	dentry->d_flags &= ~DCACHE_REFERENCED;
> +	spin_unlock(&dentry->d_lock);
> +
> +	next = rcu_dereference_raw(dentry->d_hash.next);
> +	while (1) {
> +		victim = hlist_bl_entry(next, struct dentry, d_hash);
> +
> +		if (!next || d_count(victim) || !d_is_negative(victim) ||
> +		    (victim->d_flags & DCACHE_REFERENCED)) {
> +			rcu_read_unlock();
> +			return;
> +		}
> +
> +		if (!keep--)
> +			break;
> +
> +		next = rcu_dereference_raw(next->next);
> +	}
> +
> +	spin_lock(&victim->d_lock);
> +	parent = lock_parent(victim);
> +
> +	rcu_read_unlock();
> +
> +	if (d_count(victim) || !d_is_negative(victim) ||
> +	    (victim->d_flags & DCACHE_REFERENCED)) {
> +		if (parent)
> +			spin_unlock(&parent->d_lock);
> +		spin_unlock(&victim->d_lock);
> +		return;
> +	}
> +
> +	__dentry_kill(victim);
> +	dput(parent);
> +}

Since you are picking a victim from the hash list, I think it is better 
to kill it only if it has already been in the LRU. Otherwise, it could 
be in the process of being instantiated or in the middle of some operations.

Besides, I feel a bit uneasy about picking a random negative dentry to 
kill like that.

Cheers,
Longman
diff mbox series

Patch

diff --git a/fs/dcache.c b/fs/dcache.c
index 60158065891e..9f3d331b4978 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -632,6 +632,58 @@  static inline struct dentry *lock_parent(struct dentry *dentry)
 	return __lock_parent(dentry);
 }
 
+/*
+ * Called at first dput of each negative dentry.
+ * Prevents filling cache with never reused negative dentries.
+ *
+ * This clears reference and then looks at following dentries in hash chain.
+ * If they are negative, unused and unreferenced then keep two and kill third.
+ */
+static void trim_negative(struct dentry *dentry)
+	__releases(dentry->d_lock)
+{
+	struct dentry *victim, *parent;
+	struct hlist_bl_node *next;
+	int keep = 2;
+
+	rcu_read_lock();
+
+	dentry->d_flags &= ~DCACHE_REFERENCED;
+	spin_unlock(&dentry->d_lock);
+
+	next = rcu_dereference_raw(dentry->d_hash.next);
+	while (1) {
+		victim = hlist_bl_entry(next, struct dentry, d_hash);
+
+		if (!next || d_count(victim) || !d_is_negative(victim) ||
+		    (victim->d_flags & DCACHE_REFERENCED)) {
+			rcu_read_unlock();
+			return;
+		}
+
+		if (!keep--)
+			break;
+
+		next = rcu_dereference_raw(next->next);
+	}
+
+	spin_lock(&victim->d_lock);
+	parent = lock_parent(victim);
+
+	rcu_read_unlock();
+
+	if (d_count(victim) || !d_is_negative(victim) ||
+	    (victim->d_flags & DCACHE_REFERENCED)) {
+		if (parent)
+			spin_unlock(&parent->d_lock);
+		spin_unlock(&victim->d_lock);
+		return;
+	}
+
+	__dentry_kill(victim);
+	dput(parent);
+}
+
 /*
  * Move cached negative dentry to the tail of parent->d_subdirs.
  * This lets walkers skip them all together at first sight.
@@ -655,6 +707,8 @@  static void sweep_negative(struct dentry *dentry)
 		}
 
 		spin_unlock(&parent->d_lock);
+
+		return trim_negative(dentry);
 	}
 out:
 	spin_unlock(&dentry->d_lock);