From patchwork Mon Mar 19 00:36:16 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Michael Lyle <mlyle@lyle.org>
X-Patchwork-Id: 10291459
Return-Path: <linux-block-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	CAE75600F6 for <patchwork-linux-block@patchwork.kernel.org>;
	Mon, 19 Mar 2018 00:36:49 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BC1EF28EDF
	for <patchwork-linux-block@patchwork.kernel.org>;
	Mon, 19 Mar 2018 00:36:49 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id B0EBD28F56; Mon, 19 Mar 2018 00:36:49 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E9E6128EDF
	for <patchwork-linux-block@patchwork.kernel.org>;
	Mon, 19 Mar 2018 00:36:48 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754651AbeCSAgq (ORCPT
	<rfc822;patchwork-linux-block@patchwork.kernel.org>);
	Sun, 18 Mar 2018 20:36:46 -0400
Received: from mail-pg0-f66.google.com ([74.125.83.66]:38919 "EHLO
	mail-pg0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1754685AbeCSAgn (ORCPT
	<rfc822;linux-block@vger.kernel.org>);
	Sun, 18 Mar 2018 20:36:43 -0400
Received: by mail-pg0-f66.google.com with SMTP id a19so452185pgw.6
	for <linux-block@vger.kernel.org>;
	Sun, 18 Mar 2018 17:36:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=lyle-org.20150623.gappssmtp.com; s=20150623;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=x0fKGoR8v7KzGqqDdSC4VHIGsuRIlR8ZHX7ap5IKT9w=;
	b=r6/WPiLcLde6ZBmYnQD3IXzKf2mrY6I+6hdkVsVvU6/QSf4qnM6UJ60X08FEEIfp7j
	U//VOTWXygJv5xnzHgIMkhMEvtvwguhX5l2r2aHOU4/IU6rDWU+zk35Nl1JKQZTnEkBL
	CwVTa8J73dFK/q/JuEdGj7K55wYwBMy4X+c+ynP2gQNRUOyCfjiGnWX06noCfIYRN3h4
	dyrW1ifumd8y0xKgA0ugwiqvgBIPR94ZUwNoi45QGaRO3VTBVamaNoL8agfz3eaczm97
	IU2LJxRPxZaTyRribs/OxcH2iebcgWRwPLx6p/6y4saykXWh3+70ROryNjNZ/WuNwp9p
	wWFg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=x0fKGoR8v7KzGqqDdSC4VHIGsuRIlR8ZHX7ap5IKT9w=;
	b=uCSPAPQdIiQt+oBVkWvf3nvUlPcJqtlD+eYyIw/i+J6YUmaqzJ9EGkikt1fScYtbdg
	iB3Prawyx4/l/Z/xSVhbTpDdXtOPnKArWbr/ZpBKiwYimJHXIy0B7I4SyVMYeZ6RR+3G
	lNLjWGKjvdSEO3R3ikSHxswL9X/f2ic8dLiXktyrLK1R9Mu7gIXIHo/enSDNMWhGFoJW
	0KP4T+PJQRuUlcYyhNyDgl6BxjVlMxLxDCovqJe50lfqJgXQw/eJvkrAf0SHAsa6/+NM
	MTmIdHGlyWQ1CeMFL3D00AQsVYQXcJaeeqyfZK0x9TjqzyulzMa/0hr+pr3tH2ojzL/P
	xOtg==
X-Gm-Message-State: AElRT7GOeMWyrBQETaMcICqwa6qtoTO+xE0eg6V3nrTtkHojpn0vG36k
	wIAhmZfiI0gerW/Tt98UzG+QVA==
X-Google-Smtp-Source: 
 AG47ELs3VwkhloxLcc2JQn9Zg6iIJSwRjSjZD+DoVNyzVrfmzrTdVLy5KbUVv2y48hFfOF9SpVd9kg==
X-Received: by 10.98.22.144 with SMTP id 138mr8610455pfw.114.1521419803066;
	Sun, 18 Mar 2018 17:36:43 -0700 (PDT)
Received: from midnight.lan
	(2600-6c52-6200-09b7-0000-0000-0000-0d66.dhcp6.chtrptr.net.
	[2600:6c52:6200:9b7::d66]) by smtp.gmail.com with ESMTPSA id
	a17sm27674857pfc.122.2018.03.18.17.36.41
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Sun, 18 Mar 2018 17:36:42 -0700 (PDT)
From: Michael Lyle <mlyle@lyle.org>
To: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org
Cc: axboe@fb.com, Coly Li <colyli@suse.de>,
	Michael Lyle <mlyle@lyle.org>, Hannes Reinecke <hare@suse.com>
Subject: [for-4.17 03/20] bcache: stop dc->writeback_rate_update properly
Date: Sun, 18 Mar 2018 17:36:16 -0700
Message-Id: <20180319003633.27225-4-mlyle@lyle.org>
X-Mailer: git-send-email 2.14.1
In-Reply-To: <20180319003633.27225-1-mlyle@lyle.org>
References: <20180319003633.27225-1-mlyle@lyle.org>
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

From: Coly Li <colyli@suse.de>

struct delayed_work writeback_rate_update in struct cache_dev is a delayed
worker to call function update_writeback_rate() in period (the interval is
defined by dc->writeback_rate_update_seconds).

When a metadate I/O error happens on cache device, bcache error handling
routine bch_cache_set_error() will call bch_cache_set_unregister() to
retire whole cache set. On the unregister code path, this delayed work is
stopped by calling cancel_delayed_work_sync(&dc->writeback_rate_update).

dc->writeback_rate_update is a special delayed work from others in bcache.
In its routine update_writeback_rate(), this delayed work is re-armed
itself. That means when cancel_delayed_work_sync() returns, this delayed
work can still be executed after several seconds defined by
dc->writeback_rate_update_seconds.

The problem is, after cancel_delayed_work_sync() returns, the cache set
unregister code path will continue and release memory of struct cache set.
Then the delayed work is scheduled to run, __update_writeback_rate()
will reference the already released cache_set memory, and trigger a NULL
pointer deference fault.

This patch introduces two more bcache device flags,
- BCACHE_DEV_WB_RUNNING
  bit set:  bcache device is in writeback mode and running, it is OK for
            dc->writeback_rate_update to re-arm itself.
  bit clear:bcache device is trying to stop dc->writeback_rate_update,
            this delayed work should not re-arm itself and quit.
- BCACHE_DEV_RATE_DW_RUNNING
  bit set:  routine update_writeback_rate() is executing.
  bit clear: routine update_writeback_rate() quits.

This patch also adds a function cancel_writeback_rate_update_dwork() to
wait for dc->writeback_rate_update quits before cancel it by calling
cancel_delayed_work_sync(). In order to avoid a deadlock by unexpected
quit dc->writeback_rate_update, after time_out seconds this function will
give up and continue to call cancel_delayed_work_sync().

And here I explain how this patch stops self re-armed delayed work properly
with the above stuffs.

update_writeback_rate() sets BCACHE_DEV_RATE_DW_RUNNING at its beginning
and clears BCACHE_DEV_RATE_DW_RUNNING at its end. Before calling
cancel_writeback_rate_update_dwork() clear flag BCACHE_DEV_WB_RUNNING.

Before calling cancel_delayed_work_sync() wait utill flag
BCACHE_DEV_RATE_DW_RUNNING is clear. So when calling
cancel_delayed_work_sync(), dc->writeback_rate_update must be already re-
armed, or quite by seeing BCACHE_DEV_WB_RUNNING cleared. In both cases
delayed work routine update_writeback_rate() won't be executed after
cancel_delayed_work_sync() returns.

Inside update_writeback_rate() before calling schedule_delayed_work(), flag
BCACHE_DEV_WB_RUNNING is checked before. If this flag is cleared, it means
someone is about to stop the delayed work. Because flag
BCACHE_DEV_RATE_DW_RUNNING is set already and cancel_delayed_work_sync()
has to wait for this flag to be cleared, we don't need to worry about race
condition here.

If update_writeback_rate() is scheduled to run after checking
BCACHE_DEV_RATE_DW_RUNNING and before calling cancel_delayed_work_sync()
in cancel_writeback_rate_update_dwork(), it is also safe. Because at this
moment BCACHE_DEV_WB_RUNNING is cleared with memory barrier. As I mentioned
previously, update_writeback_rate() will see BCACHE_DEV_WB_RUNNING is clear
and quit immediately.

Because there are more dependences inside update_writeback_rate() to struct
cache_set memory, dc->writeback_rate_update is not a simple self re-arm
delayed work. After trying many different methods (e.g. hold dc->count, or
use locks), this is the only way I can find which works to properly stop
dc->writeback_rate_update delayed work.

Changelog:
v3: change values of BCACHE_DEV_WB_RUNNING and BCACHE_DEV_RATE_DW_RUNNING
    to bit index, for test_bit().
v2: Try to fix the race issue which is pointed out by Junhui.
v1: The initial version for review

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Junhui Tang <tang.junhui@zte.com.cn>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Cc: Michael Lyle <mlyle@lyle.org>
Cc: Hannes Reinecke <hare@suse.com>
---
 drivers/md/bcache/bcache.h    |  9 +++++----
 drivers/md/bcache/super.c     | 38 ++++++++++++++++++++++++++++++++++----
 drivers/md/bcache/sysfs.c     |  3 ++-
 drivers/md/bcache/writeback.c | 29 ++++++++++++++++++++++++++++-
 4 files changed, 69 insertions(+), 10 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 12e5197f186c..b5ddb848cd31 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -258,10 +258,11 @@ struct bcache_device {
 	struct gendisk		*disk;
 
 	unsigned long		flags;
-#define BCACHE_DEV_CLOSING	0
-#define BCACHE_DEV_DETACHING	1
-#define BCACHE_DEV_UNLINK_DONE	2
-
+#define BCACHE_DEV_CLOSING		0
+#define BCACHE_DEV_DETACHING		1
+#define BCACHE_DEV_UNLINK_DONE		2
+#define BCACHE_DEV_WB_RUNNING		3
+#define BCACHE_DEV_RATE_DW_RUNNING	4
 	unsigned		nr_stripes;
 	unsigned		stripe_size;
 	atomic_t		*stripe_sectors_dirty;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index f4dea0209fce..29aa66ad33be 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -899,6 +899,31 @@ void bch_cached_dev_run(struct cached_dev *dc)
 		pr_debug("error creating sysfs link");
 }
 
+/*
+ * If BCACHE_DEV_RATE_DW_RUNNING is set, it means routine of the delayed
+ * work dc->writeback_rate_update is running. Wait until the routine
+ * quits (BCACHE_DEV_RATE_DW_RUNNING is clear), then continue to
+ * cancel it. If BCACHE_DEV_RATE_DW_RUNNING is not clear after time_out
+ * seconds, give up waiting here and continue to cancel it too.
+ */
+static void cancel_writeback_rate_update_dwork(struct cached_dev *dc)
+{
+	int time_out = WRITEBACK_RATE_UPDATE_SECS_MAX * HZ;
+
+	do {
+		if (!test_bit(BCACHE_DEV_RATE_DW_RUNNING,
+			      &dc->disk.flags))
+			break;
+		time_out--;
+		schedule_timeout_interruptible(1);
+	} while (time_out > 0);
+
+	if (time_out == 0)
+		pr_warn("give up waiting for dc->writeback_write_update to quit");
+
+	cancel_delayed_work_sync(&dc->writeback_rate_update);
+}
+
 static void cached_dev_detach_finish(struct work_struct *w)
 {
 	struct cached_dev *dc = container_of(w, struct cached_dev, detach);
@@ -911,7 +936,9 @@ static void cached_dev_detach_finish(struct work_struct *w)
 
 	mutex_lock(&bch_register_lock);
 
-	cancel_delayed_work_sync(&dc->writeback_rate_update);
+	if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags))
+		cancel_writeback_rate_update_dwork(dc);
+
 	if (!IS_ERR_OR_NULL(dc->writeback_thread)) {
 		kthread_stop(dc->writeback_thread);
 		dc->writeback_thread = NULL;
@@ -954,6 +981,7 @@ void bch_cached_dev_detach(struct cached_dev *dc)
 	closure_get(&dc->disk.cl);
 
 	bch_writeback_queue(dc);
+
 	cached_dev_put(dc);
 }
 
@@ -1092,14 +1120,16 @@ static void cached_dev_free(struct closure *cl)
 {
 	struct cached_dev *dc = container_of(cl, struct cached_dev, disk.cl);
 
-	cancel_delayed_work_sync(&dc->writeback_rate_update);
+	mutex_lock(&bch_register_lock);
+
+	if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags))
+		cancel_writeback_rate_update_dwork(dc);
+
 	if (!IS_ERR_OR_NULL(dc->writeback_thread))
 		kthread_stop(dc->writeback_thread);
 	if (dc->writeback_write_wq)
 		destroy_workqueue(dc->writeback_write_wq);
 
-	mutex_lock(&bch_register_lock);
-
 	if (atomic_read(&dc->running))
 		bd_unlink_disk_holder(dc->bdev, dc->disk.disk);
 	bcache_device_free(&dc->disk);
diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
index 78cd7bd50fdd..55673508628f 100644
--- a/drivers/md/bcache/sysfs.c
+++ b/drivers/md/bcache/sysfs.c
@@ -309,7 +309,8 @@ STORE(bch_cached_dev)
 		bch_writeback_queue(dc);
 
 	if (attr == &sysfs_writeback_percent)
-		schedule_delayed_work(&dc->writeback_rate_update,
+		if (!test_and_set_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags))
+			schedule_delayed_work(&dc->writeback_rate_update,
 				      dc->writeback_rate_update_seconds * HZ);
 
 	mutex_unlock(&bch_register_lock);
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 4dbeaaa575bf..8f98ef1038d3 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -115,6 +115,21 @@ static void update_writeback_rate(struct work_struct *work)
 					     struct cached_dev,
 					     writeback_rate_update);
 
+	/*
+	 * should check BCACHE_DEV_RATE_DW_RUNNING before calling
+	 * cancel_delayed_work_sync().
+	 */
+	set_bit(BCACHE_DEV_RATE_DW_RUNNING, &dc->disk.flags);
+	/* paired with where BCACHE_DEV_RATE_DW_RUNNING is tested */
+	smp_mb();
+
+	if (!test_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags)) {
+		clear_bit(BCACHE_DEV_RATE_DW_RUNNING, &dc->disk.flags);
+		/* paired with where BCACHE_DEV_RATE_DW_RUNNING is tested */
+		smp_mb();
+		return;
+	}
+
 	down_read(&dc->writeback_lock);
 
 	if (atomic_read(&dc->has_dirty) &&
@@ -123,8 +138,18 @@ static void update_writeback_rate(struct work_struct *work)
 
 	up_read(&dc->writeback_lock);
 
-	schedule_delayed_work(&dc->writeback_rate_update,
+	if (test_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags)) {
+		schedule_delayed_work(&dc->writeback_rate_update,
 			      dc->writeback_rate_update_seconds * HZ);
+	}
+
+	/*
+	 * should check BCACHE_DEV_RATE_DW_RUNNING before calling
+	 * cancel_delayed_work_sync().
+	 */
+	clear_bit(BCACHE_DEV_RATE_DW_RUNNING, &dc->disk.flags);
+	/* paired with where BCACHE_DEV_RATE_DW_RUNNING is tested */
+	smp_mb();
 }
 
 static unsigned writeback_delay(struct cached_dev *dc, unsigned sectors)
@@ -675,6 +700,7 @@ void bch_cached_dev_writeback_init(struct cached_dev *dc)
 	dc->writeback_rate_p_term_inverse = 40;
 	dc->writeback_rate_i_term_inverse = 10000;
 
+	WARN_ON(test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags));
 	INIT_DELAYED_WORK(&dc->writeback_rate_update, update_writeback_rate);
 }
 
@@ -693,6 +719,7 @@ int bch_cached_dev_writeback_start(struct cached_dev *dc)
 		return PTR_ERR(dc->writeback_thread);
 	}
 
+	WARN_ON(test_and_set_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags));
 	schedule_delayed_work(&dc->writeback_rate_update,
 			      dc->writeback_rate_update_seconds * HZ);