diff mbox series

[1/3] bcache: ignore pending signals when creating gc and allocator thread

Message ID 20200213141207.77219-2-colyli@suse.de (mailing list archive)
State New, archived
Headers show
Series bcache patches for Linux v5.6-rc2 | expand

Commit Message

Coly Li Feb. 13, 2020, 2:12 p.m. UTC
When run a cache set, all the bcache btree node of this cache set will
be checked by bch_btree_check(). If the bcache btree is very large,
iterating all the btree nodes will occupy too much system memory and
the bcache registering process might be selected and killed by system
OOM killer. kthread_run() will fail if current process has pending
signal, therefore the kthread creating in run_cache_set() for gc and
allocator kernel threads are very probably failed for a very large
bcache btree.

Indeed such OOM is safe and the registering process will exit after
the registration done. Therefore this patch flushes pending signals
during the cache set start up, specificly in bch_cache_allocator_start()
and bch_gc_thread_start(), to make sure run_cache_set() won't fail for
large cahced data set.

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/alloc.c | 18 ++++++++++++++++--
 drivers/md/bcache/btree.c | 13 +++++++++++++
 2 files changed, 29 insertions(+), 2 deletions(-)

Comments

Christoph Hellwig Feb. 19, 2020, 4:32 p.m. UTC | #1
On Thu, Feb 13, 2020 at 10:12:05PM +0800, Coly Li wrote:
> +	/*
> +	 * In case previous btree check operation occupies too many
> +	 * system memory for bcache btree node cache, and the
> +	 * registering process is selected by OOM killer. Here just
> +	 * ignore the SIGKILL sent by OOM killer if there is, to
> +	 * avoid kthread_run() being failed by pending signals. The
> +	 * bcache registering process will exit after the registration
> +	 * done.
> +	 */
> +	if (signal_pending(current))
> +		flush_signals(current);
> +
> +	k = kthread_run(bch_allocator_thread, ca, "bcache_allocator");

This really needs to go into the kthread code itself instead of
requiring cargo culting in the callers.
Coly Li Feb. 20, 2020, 1:20 p.m. UTC | #2
On 2020/2/20 12:32 上午, Christoph Hellwig wrote:
> On Thu, Feb 13, 2020 at 10:12:05PM +0800, Coly Li wrote:
>> +	/*
>> +	 * In case previous btree check operation occupies too many
>> +	 * system memory for bcache btree node cache, and the
>> +	 * registering process is selected by OOM killer. Here just
>> +	 * ignore the SIGKILL sent by OOM killer if there is, to
>> +	 * avoid kthread_run() being failed by pending signals. The
>> +	 * bcache registering process will exit after the registration
>> +	 * done.
>> +	 */
>> +	if (signal_pending(current))
>> +		flush_signals(current);
>> +
>> +	k = kthread_run(bch_allocator_thread, ca, "bcache_allocator");
> 
> This really needs to go into the kthread code itself instead of
> requiring cargo culting in the callers.
> 

Hi Christoph,

Correct me if I am wrong.

If the signal is set before calling kthread_run(), kthread_run() will
fail and return -EINTR as code comment of __kthread_create_on_node() says,

 315  /*
 316   * Wait for completion in killable state, for I might be chosen by
 317   * the OOM killer while kthreadd is trying to allocate memory for
 318   * new kernel thread.
 319   */
 320   if (unlikely(wait_for_completion_killable(&done))) {
 321       /*
 322       * If I was SIGKILLed before kthreadd (or new kernel thread)
 323       * calls complete(), leave the cleanup of this structure to
 324       * that thread.
 325       */
 326       if (xchg(&create->done, NULL))
 327             return ERR_PTR(-EINTR);
 328       /*
 329        * kthreadd (or new kernel thread) will call complete()
 330        * shortly.
 331        */
 332        wait_for_completion(&done);
 333   }

So the caller of kthread_run(), in this case it is
bch_cache_allocator_start() will receive -EINTR, and returns error to
its caller bch_cache_set_alloc(). Then the registration will fail and
ignore what the kthread routine does in parallel.

Therefore I need to explicitly call pending_signal() before calling
kthread_run().
Christoph Hellwig Feb. 20, 2020, 4:38 p.m. UTC | #3
On Thu, Feb 20, 2020 at 09:20:49PM +0800, Coly Li wrote:
> Therefore I need to explicitly call pending_signal() before calling
> kthread_run().

Right now you have to.  But the proper fix is to not require this and
fix kthread_run to work from a thread that has been selected by the OOM
killer.  In the most trivial version by moving your code into
kthread_run, but there probably are better ways as well.
Coly Li Feb. 20, 2020, 4:47 p.m. UTC | #4
On 2020/2/21 12:38 上午, Christoph Hellwig wrote:
> On Thu, Feb 20, 2020 at 09:20:49PM +0800, Coly Li wrote:
>> Therefore I need to explicitly call pending_signal() before calling
>> kthread_run().
> 
> Right now you have to.  But the proper fix is to not require this and
> fix kthread_run to work from a thread that has been selected by the OOM
> killer.  In the most trivial version by moving your code into
> kthread_run, but there probably are better ways as well.
> 

Yes I see. Let me think about it, at this moment kthread_run() is still
not very clear in my mind.

Thanks for the hint.
diff mbox series

Patch

diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c
index a1df0d95151c..8bc1faf71ff2 100644
--- a/drivers/md/bcache/alloc.c
+++ b/drivers/md/bcache/alloc.c
@@ -67,6 +67,7 @@ 
 #include <linux/blkdev.h>
 #include <linux/kthread.h>
 #include <linux/random.h>
+#include <linux/sched/signal.h>
 #include <trace/events/bcache.h>
 
 #define MAX_OPEN_BUCKETS 128
@@ -733,8 +734,21 @@  int bch_open_buckets_alloc(struct cache_set *c)
 
 int bch_cache_allocator_start(struct cache *ca)
 {
-	struct task_struct *k = kthread_run(bch_allocator_thread,
-					    ca, "bcache_allocator");
+	struct task_struct *k;
+
+	/*
+	 * In case previous btree check operation occupies too many
+	 * system memory for bcache btree node cache, and the
+	 * registering process is selected by OOM killer. Here just
+	 * ignore the SIGKILL sent by OOM killer if there is, to
+	 * avoid kthread_run() being failed by pending signals. The
+	 * bcache registering process will exit after the registration
+	 * done.
+	 */
+	if (signal_pending(current))
+		flush_signals(current);
+
+	k = kthread_run(bch_allocator_thread, ca, "bcache_allocator");
 	if (IS_ERR(k))
 		return PTR_ERR(k);
 
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 1e020bd7f5ac..4009023305a3 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -34,6 +34,7 @@ 
 #include <linux/random.h>
 #include <linux/rcupdate.h>
 #include <linux/sched/clock.h>
+#include <linux/sched/signal.h>
 #include <linux/rculist.h>
 #include <linux/delay.h>
 #include <trace/events/bcache.h>
@@ -1913,6 +1914,18 @@  static int bch_gc_thread(void *arg)
 
 int bch_gc_thread_start(struct cache_set *c)
 {
+	/*
+	 * In case previous btree check operation occupies too many
+	 * system memory for bcache btree node cache, and the
+	 * registering process is selected by OOM killer. Here just
+	 * ignore the SIGKILL sent by OOM killer if there is, to
+	 * avoid kthread_run() being failed by pending signals. The
+	 * bcache registering process will exit after the registration
+	 * done.
+	 */
+	if (signal_pending(current))
+		flush_signals(current);
+
 	c->gc_thread = kthread_run(bch_gc_thread, c, "bcache_gc");
 	return PTR_ERR_OR_ZERO(c->gc_thread);
 }