diff mbox

[2/6] bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()

Message ID 20180502144659.118628-3-colyli@suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

Coly Li May 2, 2018, 2:46 p.m. UTC
Commit c7b7bd07404c5 ("bcache: add io_disable to struct cached_dev") tries
to stop bcache device by calling bcache_device_stop() when too many I/O
errors happened on backing device. But if there is internal I/O happening
on cache device (writeback scan, garbage collection, etc), a regular I/O
request triggers the internal I/Os may still holds a refcount of dc->count,
and the refcount may only be dropped after the internal I/O stopped.

By this patch, bch_cached_dev_error() will check if the backing device is
attached to a cache set, if yes that CACHE_SET_IO_DISABLE will be set to
flags of this cache set. Then internal I/Os on cache device will be
rejected and stopped immediately, and the bcache device can be stopped.

For people who are not familiar with the interesting refcount dependance,
let me explain a bit more how the fix works. Example the writeback thread
will scan cache device for dirty data writeback purpose. Before it stopps,
it holds a refcount of dc->count. When CACHE_SET_IO_DISABLE bit is set,
the internal I/O will stopped and the while-loop in bch_writeback_thread()
quits and calls cached_dev_put() to drop dc->count. If this is the last
refcount to drop, then cached_dev_detach_finish() will be called. In this
call back function, in turn closure_put(dc->disk.cl) is called to drop a
refcount of closure dc->disk.cl. If this is the last refcount of this
closure to drop, then cached_dev_flush() will be called. Then the cached
device is freed. So if CACHE_SET_IO_DISABLE is not set, the bache device
can not be stopped until all inernal cache device I/O stopped. For large
size cache device, and writeback thread competes locks with gc thread,
there might be a quite long time to wait.

Fixes: c7b7bd07404c5 ("bcache: add io_disable to struct cached_dev")
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/super.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Hannes Reinecke May 3, 2018, 5:53 a.m. UTC | #1
On 05/02/2018 04:46 PM, Coly Li wrote:
> Commit c7b7bd07404c5 ("bcache: add io_disable to struct cached_dev") tries
> to stop bcache device by calling bcache_device_stop() when too many I/O
> errors happened on backing device. But if there is internal I/O happening
> on cache device (writeback scan, garbage collection, etc), a regular I/O
> request triggers the internal I/Os may still holds a refcount of dc->count,
> and the refcount may only be dropped after the internal I/O stopped.
> 
> By this patch, bch_cached_dev_error() will check if the backing device is
> attached to a cache set, if yes that CACHE_SET_IO_DISABLE will be set to
> flags of this cache set. Then internal I/Os on cache device will be
> rejected and stopped immediately, and the bcache device can be stopped.
> 
> For people who are not familiar with the interesting refcount dependance,
> let me explain a bit more how the fix works. Example the writeback thread
> will scan cache device for dirty data writeback purpose. Before it stopps,
> it holds a refcount of dc->count. When CACHE_SET_IO_DISABLE bit is set,
> the internal I/O will stopped and the while-loop in bch_writeback_thread()
> quits and calls cached_dev_put() to drop dc->count. If this is the last
> refcount to drop, then cached_dev_detach_finish() will be called. In this
> call back function, in turn closure_put(dc->disk.cl) is called to drop a
> refcount of closure dc->disk.cl. If this is the last refcount of this
> closure to drop, then cached_dev_flush() will be called. Then the cached
> device is freed. So if CACHE_SET_IO_DISABLE is not set, the bache device
> can not be stopped until all inernal cache device I/O stopped. For large
> size cache device, and writeback thread competes locks with gc thread,
> there might be a quite long time to wait.
> 
> Fixes: c7b7bd07404c5 ("bcache: add io_disable to struct cached_dev")
> Signed-off-by: Coly Li <colyli@suse.de>
> ---
>   drivers/md/bcache/super.c | 17 +++++++++++++++++
>   1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 8196b19fada2..a0d5a3ccc7d0 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1369,6 +1369,8 @@ int bch_flash_dev_create(struct cache_set *c, uint64_t size)
>   
>   bool bch_cached_dev_error(struct cached_dev *dc)
>   {
> +	struct cache_set *c;
> +
>   	if (!dc || test_bit(BCACHE_DEV_CLOSING, &dc->disk.flags))
>   		return false;
>   
> @@ -1379,6 +1381,21 @@ bool bch_cached_dev_error(struct cached_dev *dc)
>   	pr_err("stop %s: too many IO errors on backing device %s\n",
>   		dc->disk.disk->disk_name, dc->backing_dev_name);
>   
> +	/*
> +	 * If the cached device is still attached to a cache set,
> +	 * even dc->io_disable is true and no more I/O requests
> +	 * accepted, cache device internal I/O (writeback scan or
> +	 * garbage collection) may still prevent bcache device from
> +	 * being stopped. So here CACHE_SET_IO_DISABLE should be
> +	 * set to c->flags too, to make the internal I/O to cache
> +	 * device rejected and stopped immediately.
> +	 * If c is NULL, that means the bcache device is not attached
> +	 * to any cache set, then no CACHE_SET_IO_DISABLE bit to set.
> +	 */
> +	c = dc->disk.c;
> +	if (c && test_and_set_bit(CACHE_SET_IO_DISABLE, &c->flags))
> +		pr_warn("CACHE_SET_IO_DISABLE already set");
> +
>   	bcache_device_stop(&dc->disk);
>   	return true;
>   }
> 
Neat.

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
diff mbox

Patch

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 8196b19fada2..a0d5a3ccc7d0 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1369,6 +1369,8 @@  int bch_flash_dev_create(struct cache_set *c, uint64_t size)
 
 bool bch_cached_dev_error(struct cached_dev *dc)
 {
+	struct cache_set *c;
+
 	if (!dc || test_bit(BCACHE_DEV_CLOSING, &dc->disk.flags))
 		return false;
 
@@ -1379,6 +1381,21 @@  bool bch_cached_dev_error(struct cached_dev *dc)
 	pr_err("stop %s: too many IO errors on backing device %s\n",
 		dc->disk.disk->disk_name, dc->backing_dev_name);
 
+	/*
+	 * If the cached device is still attached to a cache set,
+	 * even dc->io_disable is true and no more I/O requests
+	 * accepted, cache device internal I/O (writeback scan or
+	 * garbage collection) may still prevent bcache device from
+	 * being stopped. So here CACHE_SET_IO_DISABLE should be
+	 * set to c->flags too, to make the internal I/O to cache
+	 * device rejected and stopped immediately.
+	 * If c is NULL, that means the bcache device is not attached
+	 * to any cache set, then no CACHE_SET_IO_DISABLE bit to set.
+	 */
+	c = dc->disk.c;
+	if (c && test_and_set_bit(CACHE_SET_IO_DISABLE, &c->flags))
+		pr_warn("CACHE_SET_IO_DISABLE already set");
+
 	bcache_device_stop(&dc->disk);
 	return true;
 }