diff mbox

[PATCHv4] bcache: only permit to recovery read error when cache device is clean

Message ID 20171016180428.123657-1-colyli@suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

Coly Li Oct. 16, 2017, 6:04 p.m. UTC
When bcache does read I/Os, for example in writeback or writethrough mode,
if a read request on cache device is failed, bcache will try to recovery
the request by reading from cached device. If the data on cached device is
not synced with cache device, then requester will get a stale data.

For critical storage system like database, providing stale data from
recovery may result an application level data corruption, which is
unacceptible.

With this patch, for a failed read request in writeback or writethrough
mode, recovery a recoverable read request only happens when cache device
is clean. That is to say, all data on cached device is up to update.

For other cache modes in bcache, read request will never hit
cached_dev_read_error(), they don't need this patch.

Please note, because cache mode can be switched arbitrarily in run time, a
writethrough mode might be switched from a writeback mode. Therefore
checking dc->has_data in writethrough mode still makes sense.

Changelog:
V4: Fix parens error pointed by Michael Lyle.
v3: By response from Kent Oversteet, he thinks recovering stale data is a
    bug to fix, and option to permit it is unneccessary. So this version
    the sysfs file is removed.
v2: rename sysfs entry from allow_stale_data_on_failure  to
    allow_stale_data_on_failure, and fix the confusing commit log.
v1: initial patch posted.

Signed-off-by: Coly Li <colyli@suse.de>
Reported-by: Arne Wolf <awolf@lenovo.com>
Acked-by: Michael Lyle <mlyle@lyle.org>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Nix <nix@esperi.org.uk>
Cc: Kai Krakow <hurikhan77@gmail.com>
Cc: Eric Wheeler <bcache@lists.ewheeler.net>
Cc: Junhui Tang <tang.junhui@zte.com.cn>
Cc: stable@vger.kernel.org
---
 drivers/md/bcache/request.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Eric Wheeler Oct. 27, 2017, 7:57 p.m. UTC | #1
On Tue, 17 Oct 2017, Coly Li wrote:

> When bcache does read I/Os, for example in writeback or writethrough mode,
> if a read request on cache device is failed, bcache will try to recovery
> the request by reading from cached device. If the data on cached device is
> not synced with cache device, then requester will get a stale data.
> 
> For critical storage system like database, providing stale data from
> recovery may result an application level data corruption, which is
> unacceptible.
> 
> With this patch, for a failed read request in writeback or writethrough
> mode, recovery a recoverable read request only happens when cache device
> is clean. That is to say, all data on cached device is up to update.

Can this be relaxed to only error when the key failing to read is dirty?  
The liklihood of a 100% clean cache in writeback on a busy system seems 
unlikely.

Can KEY_DIRTY facilitate this?


--
Eric Wheeler



> 
> For other cache modes in bcache, read request will never hit
> cached_dev_read_error(), they don't need this patch.
> 
> Please note, because cache mode can be switched arbitrarily in run time, a
> writethrough mode might be switched from a writeback mode. Therefore
> checking dc->has_data in writethrough mode still makes sense.
> 
> Changelog:
> V4: Fix parens error pointed by Michael Lyle.
> v3: By response from Kent Oversteet, he thinks recovering stale data is a
>     bug to fix, and option to permit it is unneccessary. So this version
>     the sysfs file is removed.
> v2: rename sysfs entry from allow_stale_data_on_failure  to
>     allow_stale_data_on_failure, and fix the confusing commit log.
> v1: initial patch posted.
> 
> Signed-off-by: Coly Li <colyli@suse.de>
> Reported-by: Arne Wolf <awolf@lenovo.com>
> Acked-by: Michael Lyle <mlyle@lyle.org>
> Cc: Kent Overstreet <kent.overstreet@gmail.com>
> Cc: Nix <nix@esperi.org.uk>
> Cc: Kai Krakow <hurikhan77@gmail.com>
> Cc: Eric Wheeler <bcache@lists.ewheeler.net>
> Cc: Junhui Tang <tang.junhui@zte.com.cn>
> Cc: stable@vger.kernel.org
> ---
>  drivers/md/bcache/request.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> index 681b4f12b05a..e7f769ff7234 100644
> --- a/drivers/md/bcache/request.c
> +++ b/drivers/md/bcache/request.c
> @@ -697,8 +697,16 @@ static void cached_dev_read_error(struct closure *cl)
>  {
>  	struct search *s = container_of(cl, struct search, cl);
>  	struct bio *bio = &s->bio.bio;
> +	struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
>  
> -	if (s->recoverable) {
> +	/*
> +	 * If cache device is dirty (dc->has_dirty is non-zero), then
> +	 * recovery a failed read request from cached device may get a
> +	 * stale data back. So read failure recovery is only permitted
> +	 * when cache device is clean.
> +	 */
> +	if (s->recoverable &&
> +	    (dc && !atomic_read(&dc->has_dirty))) {
>  		/* Retry from the backing device: */
>  		trace_bcache_read_retry(s->orig_bio);
>  
> -- 
> 2.13.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Michael Lyle Oct. 27, 2017, 8 p.m. UTC | #2
On Fri, Oct 27, 2017 at 12:57 PM, Eric Wheeler
<bcache@lists.ewheeler.net> wrote:
> Can this be relaxed to only error when the key failing to read is dirty?
> The liklihood of a 100% clean cache in writeback on a busy system seems
> unlikely.
>
> Can KEY_DIRTY facilitate this?

Don't we only have the metadata to know if the key is dirty on the
cache if we have the cache device? ;)

Mike
Eric Wheeler Oct. 27, 2017, 9:13 p.m. UTC | #3
On Fri, 27 Oct 2017, Michael Lyle wrote:

> On Fri, Oct 27, 2017 at 12:57 PM, Eric Wheeler
> <bcache@lists.ewheeler.net> wrote:
> > Can this be relaxed to only error when the key failing to read is dirty?
> > The liklihood of a 100% clean cache in writeback on a busy system seems
> > unlikely.
> >
> > Can KEY_DIRTY facilitate this?
> 
> Don't we only have the metadata to know if the key is dirty on the
> cache if we have the cache device? ;)

Certainly if this is for removal or a missing cache (perhaps I missed 
that).

However, I thought this was just a recovery on an IO error where the disk 
might be mostly dead--but partly alive!

Of course if the metadata lookup fails subsequently, then you would need 
to fall back to the dc->has_dirty flag.

--
Eric Wheeler



> 
> Mike
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Michael Lyle Oct. 27, 2017, 9:24 p.m. UTC | #4
On Fri, Oct 27, 2017 at 2:13 PM, Eric Wheeler <bcache@lists.ewheeler.net> wrote:
[snip]
>> > Can KEY_DIRTY facilitate this?
>>
>> Don't we only have the metadata to know if the key is dirty on the
>> cache if we have the cache device? ;)
>
> Certainly if this is for removal or a missing cache (perhaps I missed
> that).
>
> However, I thought this was just a recovery on an IO error where the disk
> might be mostly dead--but partly alive!
>
> Of course if the metadata lookup fails subsequently, then you would need
> to fall back to the dc->has_dirty flag.

Seems like something that's a lot of effort for little gain.  It'll
only help when A) everything you need isn't dirty, and B) all the
associated btree nodes are in memory.  Since bcache tries to keep 10%
dirty by default, and it's likely to be heavy on filesystem metadata--
I don't know how often this would help anyone in the real world.

Mike
Eric Wheeler Oct. 27, 2017, 11:31 p.m. UTC | #5
On Fri, 27 Oct 2017, Michael Lyle wrote:

> On Fri, Oct 27, 2017 at 2:13 PM, Eric Wheeler <bcache@lists.ewheeler.net> wrote:
> [snip]
> >> > Can KEY_DIRTY facilitate this?
> >>
> >> Don't we only have the metadata to know if the key is dirty on the
> >> cache if we have the cache device? ;)
> >
> > Certainly if this is for removal or a missing cache (perhaps I missed
> > that).
> >
> > However, I thought this was just a recovery on an IO error where the disk
> > might be mostly dead--but partly alive!
> >
> > Of course if the metadata lookup fails subsequently, then you would need
> > to fall back to the dc->has_dirty flag.
> 
> Seems like something that's a lot of effort for little gain.  It'll
> only help when A) everything you need isn't dirty, and 

Indeed!

> B) all the associated btree nodes are in memory.  

Is the dirty key map not memory resident?  

> Since bcache tries to keep 10% dirty by default, and it's likely to be 
> heavy on filesystem metadata-- I don't know how often this would help 
> anyone in the real world.

Perhaps not---but if it is easy to implement, then this would provide the 
most optimistic recovery.



--
Eric Wheeler


> 
> Mike
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Michael Lyle Oct. 27, 2017, 11:36 p.m. UTC | #6
On Fri, Oct 27, 2017 at 4:31 PM, Eric Wheeler <bcache@lists.ewheeler.net> wrote:
>> Seems like something that's a lot of effort for little gain.  It'll
>> only help when A) everything you need isn't dirty, and
>
> Indeed!
>
>> B) all the associated btree nodes are in memory.
>
> Is the dirty key map not memory resident?

The overall bcache btree says where stuff is resident and whether it's
dirty.  It is stored on disk and portions of it are cached in memory.

If data is hot, the relevant tree nodes are likely to be in memory,
but the data is also more likely to be dirty.

Mike
diff mbox

Patch

diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 681b4f12b05a..e7f769ff7234 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -697,8 +697,16 @@  static void cached_dev_read_error(struct closure *cl)
 {
 	struct search *s = container_of(cl, struct search, cl);
 	struct bio *bio = &s->bio.bio;
+	struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
 
-	if (s->recoverable) {
+	/*
+	 * If cache device is dirty (dc->has_dirty is non-zero), then
+	 * recovery a failed read request from cached device may get a
+	 * stale data back. So read failure recovery is only permitted
+	 * when cache device is clean.
+	 */
+	if (s->recoverable &&
+	    (dc && !atomic_read(&dc->has_dirty))) {
 		/* Retry from the backing device: */
 		trace_bcache_read_retry(s->orig_bio);