diff mbox

[ndctl] ndctl, check: Add a sigbus handler to detect metadata corruption

Message ID 20170414000251.23904-1-vishal.l.verma@intel.com (mailing list archive)
State Accepted
Commit 116da6f54476
Headers show

Commit Message

Verma, Vishal L April 14, 2017, 12:02 a.m. UTC
If we have poison/badblocks in the BTT metadata sections, the mmap-reads
happening in the checker will trigger a SIGBUS, and the program will
halt abruptly. Add a sigbus handler which notifies the user of this, and
prints out a relevant error message:

  namespace5.0: namespace_check: checking namespace5.0
  namespace5.0: btt_discover_arenas: found 1 BTT arena
  namespace5.0: btt_check_arenas: checking arena 0
  namespace5.0: namespace_check: Received a SIGBUS
  namespace5.0: namespace_check: Metadata corruption found, recovery is not possible
  error checking namespaces: Bad address

Cc: Dan Williams <dan.j.williams@intel.com>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 ndctl/check.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

Comments

Rudoff, Andy April 14, 2017, 12:10 a.m. UTC | #1
> If we have poison/badblocks in the BTT metadata sections, the mmap-reads
> happening in the checker will trigger a SIGBUS, and the program will
> halt abruptly. Add a sigbus handler which notifies the user of this, and
> prints out a relevant error message:
> 
>  namespace5.0: namespace_check: checking namespace5.0
>  namespace5.0: btt_discover_arenas: found 1 BTT arena
>  namespace5.0: btt_check_arenas: checking arena 0
>  namespace5.0: namespace_check: Received a SIGBUS
>  namespace5.0: namespace_check: Metadata corruption found, recovery is not possible
>  error checking namespaces: Bad address

If in repair mode, should a SIGBUS cause you to set the error state in the associated arena?

-andy
Verma, Vishal L April 14, 2017, 7 p.m. UTC | #2
On Fri, 2017-04-14 at 00:10 +0000, Rudoff, Andy wrote:
> > If we have poison/badblocks in the BTT metadata sections, the mmap-

> > reads

> > happening in the checker will trigger a SIGBUS, and the program

> > will

> > halt abruptly. Add a sigbus handler which notifies the user of

> > this, and

> > prints out a relevant error message:

> > 

> >  namespace5.0: namespace_check: checking namespace5.0

> >  namespace5.0: btt_discover_arenas: found 1 BTT arena

> >  namespace5.0: btt_check_arenas: checking arena 0

> >  namespace5.0: namespace_check: Received a SIGBUS

> >  namespace5.0: namespace_check: Metadata corruption found, recovery

> > is not possible

> >  error checking namespaces: Bad address

> 

> If in repair mode, should a SIGBUS cause you to set the error state

> in the associated arena?

> 

> -andy

> 

So I thought a bit about this -
While we can set the error state, that leaves the BTT with no way to
get any data off it (currently).

If we allow the kernel to enable the namespace, the user has some
chance to do at least partial recovery until they hit the badblock.
(assuming the badblock is in the map, which is the largest on-disk
structure). If it is on the flog or info block, then, yes, there is no
recovery.

	-Vishal
Dan Williams April 14, 2017, 7:04 p.m. UTC | #3
On Fri, Apr 14, 2017 at 12:00 PM, Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
> On Fri, 2017-04-14 at 00:10 +0000, Rudoff, Andy wrote:
>> > If we have poison/badblocks in the BTT metadata sections, the mmap-
>> > reads
>> > happening in the checker will trigger a SIGBUS, and the program
>> > will
>> > halt abruptly. Add a sigbus handler which notifies the user of
>> > this, and
>> > prints out a relevant error message:
>> >
>> >  namespace5.0: namespace_check: checking namespace5.0
>> >  namespace5.0: btt_discover_arenas: found 1 BTT arena
>> >  namespace5.0: btt_check_arenas: checking arena 0
>> >  namespace5.0: namespace_check: Received a SIGBUS
>> >  namespace5.0: namespace_check: Metadata corruption found, recovery
>> > is not possible
>> >  error checking namespaces: Bad address
>>
>> If in repair mode, should a SIGBUS cause you to set the error state
>> in the associated arena?
>>
>> -andy
>>
> So I thought a bit about this -
> While we can set the error state, that leaves the BTT with no way to
> get any data off it (currently).
>
> If we allow the kernel to enable the namespace, the user has some
> chance to do at least partial recovery until they hit the badblock.
> (assuming the badblock is in the map, which is the largest on-disk
> structure). If it is on the flog or info block, then, yes, there is no
> recovery.

Also, the kernel currently does nothing with the error bit besides
print out that it saw it.
Rudoff, Andy April 14, 2017, 7:26 p.m. UTC | #4
On Fri, Apr 14, 2017 at 12:00 PM, Verma, Vishal L
    <vishal.l.verma@intel.com> wrote:
    > On Fri, 2017-04-14 at 00:10 +0000, Rudoff, Andy wrote:

    >> > If we have poison/badblocks in the BTT metadata sections, the mmap-

    >> > reads

    >> > happening in the checker will trigger a SIGBUS, and the program

    >> > will

    >> > halt abruptly. Add a sigbus handler which notifies the user of

    >> > this, and

    >> > prints out a relevant error message:

    >> >

    >> >  namespace5.0: namespace_check: checking namespace5.0

    >> >  namespace5.0: btt_discover_arenas: found 1 BTT arena

    >> >  namespace5.0: btt_check_arenas: checking arena 0

    >> >  namespace5.0: namespace_check: Received a SIGBUS

    >> >  namespace5.0: namespace_check: Metadata corruption found, recovery

    >> > is not possible

    >> >  error checking namespaces: Bad address

    >>

    >> If in repair mode, should a SIGBUS cause you to set the error state

    >> in the associated arena?

    >>

    >> -andy

    >>

    > So I thought a bit about this -

    > While we can set the error state, that leaves the BTT with no way to

    > get any data off it (currently).

    >

    > If we allow the kernel to enable the namespace, the user has some

    > chance to do at least partial recovery until they hit the badblock.

    > (assuming the badblock is in the map, which is the largest on-disk

    > structure). If it is on the flog or info block, then, yes, there is no

    > recovery.

    
Yeah, I thought about this a little more too.  Here’s what I think should happen:

Without --repair, I agree with this patch, although the error message might
tell the poor user that running with --repair is the next step if they want to
try to fix it (see below).

With --repair, the repair code should try to replace the poison with
repaired metadata, clearing the appropriate error bits when done.

For example, if the poison is in the map, the repair code should figure
out which entries have no map entry, clear the poison, and update the
map, printing a strong warning that the blocks were repaired but may
not be in the right place any more.  If the poison is in something like
the arena info block, a replacement can be constructed from the
backup info.  The point is that the tool shouldn’t die on poison, it
should be the way you fix it and get rid of the poison.

Finally, the kernel should be setting and honoring the error bits, both
in map entries and arena info blocks.  When poison is discovered in
metadata, the arena info block error state should be set and that
should not disallow getting to any available data as you stated, but
should instead make that arena read-only.  The only way to get out
of read-only mode for the arena should be to run the check tool
with --repair and all the stuff I said above then happens.
Verma, Vishal L April 14, 2017, 7:40 p.m. UTC | #5
On Fri, 2017-04-14 at 19:26 +0000, Rudoff, Andy wrote:
>     On Fri, Apr 14, 2017 at 12:00 PM, Verma, Vishal L

[...]
>     

> Yeah, I thought about this a little more too.  Here’s what I think

> should happen:

> 

> Without --repair, I agree with this patch, although the error message

> might

> tell the poor user that running with --repair is the next step if

> they want to

> try to fix it (see below).

> 

> With --repair, the repair code should try to replace the poison with

> repaired metadata, clearing the appropriate error bits when done.

> 

> For example, if the poison is in the map, the repair code should

> figure

> out which entries have no map entry, clear the poison, and update the

> map, printing a strong warning that the blocks were repaired but may

> not be in the right place any more.  


If we hit a known badblock, that is 512B worth of map entries (128).
Should we really (almost certainly) scramble 64 blocks? :)
If it is a latent error, it will still be at least a cache line worth
of map entries, i.e. 16.

If an error is in the log, then in the badblock case, we lose both log
and log' for four lanes. Which means we can't tell is one of those 4
entries needed a map update, leaving a potential corruption window
open.

> If the poison is in something like

> the arena info block, a replacement can be constructed from the

> backup info.  The point is that the tool shouldn’t die on poison, it

> should be the way you fix it and get rid of the poison.


This might be the only case where we can clear the poison
successfully..

> 

> Finally, the kernel should be setting and honoring the error bits,

> both

> in map entries and arena info blocks.  When poison is discovered in

> metadata, the arena info block error state should be set and that

> should not disallow getting to any available data as you stated, but

> should instead make that arena read-only.  


Agreed about the error bit and read-only, I will look at that.

> The only way to get out

> of read-only mode for the arena should be to run the check tool

> with --repair and all the stuff I said above then happens.

>  

>
Rudoff, Andy April 14, 2017, 7:52 p.m. UTC | #6
>If we hit a known badblock, that is 512B worth of map entries (128).

    >Should we really (almost certainly) scramble 64 blocks? :)

    >If it is a latent error, it will still be at least a cache line worth

    >of map entries, i.e. 16.

    >

    >If an error is in the log, then in the badblock case, we lose both log

    >and log' for four lanes. Which means we can't tell is one of those 4

    >entries needed a map update, leaving a potential corruption window

    >open.

    
Before responding to this, can you walk me through the steps
a user is expected to take if poison appears the flog, for example?
Say I just want to copy out as much data as I can get, then recreate
a fresh BTT.  I think you’re telling me that a poisoned flog prevents
me from using the entire device, but maybe I’m wrong and you’re
saying I can still get to the other arenas?

Anyway, after copying out what I can, how do I recreate the BTT?
When the BTT creation code starts writing the new flog, will that
clear the poison?  Does the BTT creation code do big enough writes
so the driver will issue Clear Uncorrectable commands?

With your answers, I think we can create a typical user flow,
where the user gets error messages and tries to use our tools
to fix things.  That should tell us if we’re on the right track
or if we’re stranding the poor user with no way to recover…

Thanks,

-andy
Verma, Vishal L April 14, 2017, 8:28 p.m. UTC | #7
On Fri, 2017-04-14 at 19:52 +0000, Rudoff, Andy wrote:
>     >If we hit a known badblock, that is 512B worth of map entries

> (128).

>     >Should we really (almost certainly) scramble 64 blocks? :)

>     >If it is a latent error, it will still be at least a cache line

> worth

>     >of map entries, i.e. 16.

>     >

>     >If an error is in the log, then in the badblock case, we lose

> both log

>     >and log' for four lanes. Which means we can't tell is one of

> those 4

>     >entries needed a map update, leaving a potential corruption

> window

>     >open.

>     

> Before responding to this, can you walk me through the steps

> a user is expected to take if poison appears the flog, for example?

> Say I just want to copy out as much data as I can get, then recreate

> a fresh BTT.  I think you’re telling me that a poisoned flog prevents

> me from using the entire device, but maybe I’m wrong and you’re

> saying I can still get to the other arenas?


So I think currently the best way is to recreate the namespace and
restore from a backup.. I wonder if we have a problem hiding even here,
but more on that below.
I don't think all access will be prevented, even to that arena, and
certainly not the whole device. My point was only that we can't clear
the poison, because then all notification of "something went wrong" is
lost.

One option could be to clear the bock of poisoned map entries, and
replace them with 'error' entries (the error bit in each map entry).
That way reads will continue to fail, and writes to those entries will
fix them. I'll have to work through it and make sure this works on the
kernel side first..

> 

> Anyway, after copying out what I can, how do I recreate the BTT?

> When the BTT creation code starts writing the new flog, will that

> clear the poison?  Does the BTT creation code do big enough writes

> so the driver will issue Clear Uncorrectable commands?


After 4.12, we will at least have the ability to clear poison for data
blocks. i.e. when the requests going through rw_bytes are aligned and
sized to 512B sectors. Map and flog writes are not large enough, and so
won't clear errors..
And maybe this is something we need to fix otherwise.

Consider a scenario where we had metadata poison, we delete and
recreate the namespace. This in itself doesn't hit the poison. We will
end up hitting CMCIs when we write metaddata and Uncorrectables on
reads again since the poison wasn't cleared.

Maybe what we need here is during BTT creation, check for at least
known badblocks, and initialize them so that we know at least new BTTs
are clean.

> 

> With your answers, I think we can create a typical user flow,

> where the user gets error messages and tries to use our tools

> to fix things.  That should tell us if we’re on the right track

> or if we’re stranding the poor user with no way to recover…


Agreed, current recovery path (from poison) sure involves slaying
dragons :)

> 

> Thanks,

> 

> -andy

> 

>
Rudoff, Andy April 14, 2017, 8:31 p.m. UTC | #8
>Maybe what we need here is during BTT creation, check for at least
>known badblocks, and initialize them so that we know at least new BTTs
>are clean.

I totally agree with this one.  Sounds like a must-have to me.  The
idea of poison in a location that sticks around even after you recreate
the BTT makes me shudder.

Thanks,

-andy
Jeff Moyer April 17, 2017, 3:37 p.m. UTC | #9
Vishal Verma <vishal.l.verma@intel.com> writes:

> If we have poison/badblocks in the BTT metadata sections, the mmap-reads
> happening in the checker will trigger a SIGBUS, and the program will
> halt abruptly. Add a sigbus handler which notifies the user of this, and
> prints out a relevant error message:

This isn't much better than just segfaulting.  Did you consider whether
it was possible to reconstruct corrupt metadata in any circumstances?

-Jeff

>
>   namespace5.0: namespace_check: checking namespace5.0
>   namespace5.0: btt_discover_arenas: found 1 BTT arena
>   namespace5.0: btt_check_arenas: checking arena 0
>   namespace5.0: namespace_check: Received a SIGBUS
>   namespace5.0: namespace_check: Metadata corruption found, recovery is not possible
>   error checking namespaces: Bad address
>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Reported-by: Jeff Moyer <jmoyer@redhat.com>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  ndctl/check.c | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
>
> diff --git a/ndctl/check.c b/ndctl/check.c
> index 3b30a98..3775c2e 100644
> --- a/ndctl/check.c
> +++ b/ndctl/check.c
> @@ -13,6 +13,8 @@
>  #include <stdio.h>
>  #include <fcntl.h>
>  #include <errno.h>
> +#include <setjmp.h>
> +#include <signal.h>
>  #include <stdint.h>
>  #include <stdlib.h>
>  #include <unistd.h>
> @@ -40,6 +42,13 @@
>  #include <ndctl.h>
>  #endif
>  
> +static sigjmp_buf sj_env;
> +
> +static void sigbus_hdl(int sig, siginfo_t *siginfo, void *ptr)
> +{
> +	siglongjmp(sj_env, 1);
> +}
> +
>  static int repair_msg(struct btt_chk *bttc)
>  {
>  	info(bttc, "  Run with --repair to make the changes\n");
> @@ -872,6 +881,7 @@ int namespace_check(struct ndctl_namespace *ndns, struct check_opts *opts)
>  	int raw_mode, rc, disabled_flag = 0, open_flags;
>  	struct btt_sb *btt_sb;
>  	struct btt_chk *bttc;
> +	struct sigaction act;
>  	char path[50];
>  
>  	bttc = calloc(1, sizeof(*bttc));
> @@ -882,6 +892,15 @@ int namespace_check(struct ndctl_namespace *ndns, struct check_opts *opts)
>  	if (opts->verbose)
>  		bttc->ctx.log_priority = LOG_DEBUG;
>  
> +	memset(&act, 0, sizeof(act));
> +	act.sa_sigaction = sigbus_hdl;
> +	act.sa_flags = SA_SIGINFO;
> +
> +	if (sigaction(SIGBUS, &act, 0)) {
> +		err(bttc, "Unable to set sigaction\n");
> +		return -errno;
> +	}
> +
>  	bttc->opts = opts;
>  	bttc->start_off = BTT_START_OFFSET;
>  	bttc->sys_page_size = sysconf(_SC_PAGESIZE);
> @@ -949,6 +968,18 @@ int namespace_check(struct ndctl_namespace *ndns, struct check_opts *opts)
>  		goto out_sb;
>  	}
>  
> +	/*
> +	 * This is where we jump to if we receive a SIGBUS, prior to doing any
> +	 * mmaped reads, and can safely abort
> +	 */
> +	if (sigsetjmp(sj_env, 1)) {
> +		err(bttc, "Received a SIGBUS\n");
> +		err(bttc,
> +			"Metadata corruption found, recovery is not possible\n");
> +		rc = -EFAULT;
> +		goto out_close;
> +	}
> +
>  	rc = btt_info_read_verify(bttc, btt_sb, bttc->start_off);
>  	if (rc) {
>  		rc = btt_recover_first_sb(bttc);
Verma, Vishal L April 18, 2017, 4:09 p.m. UTC | #10
On Mon, 2017-04-17 at 11:37 -0400, Jeff Moyer wrote:
> Vishal Verma <vishal.l.verma@intel.com> writes:

> 

> > If we have poison/badblocks in the BTT metadata sections, the mmap-

> > reads

> > happening in the checker will trigger a SIGBUS, and the program

> > will

> > halt abruptly. Add a sigbus handler which notifies the user of

> > this, and

> > prints out a relevant error message:

> 

> This isn't much better than just segfaulting.  Did you consider

> whether

> it was possible to reconstruct corrupt metadata in any circumstances?

> 


The only spot we can actually correct metadata would be for an info
block, if the copy is intact. For any other poison in the map or flog,
I don't think we have any way to recover. Was that (info blocks) what
you meant, or something else?
diff mbox

Patch

diff --git a/ndctl/check.c b/ndctl/check.c
index 3b30a98..3775c2e 100644
--- a/ndctl/check.c
+++ b/ndctl/check.c
@@ -13,6 +13,8 @@ 
 #include <stdio.h>
 #include <fcntl.h>
 #include <errno.h>
+#include <setjmp.h>
+#include <signal.h>
 #include <stdint.h>
 #include <stdlib.h>
 #include <unistd.h>
@@ -40,6 +42,13 @@ 
 #include <ndctl.h>
 #endif
 
+static sigjmp_buf sj_env;
+
+static void sigbus_hdl(int sig, siginfo_t *siginfo, void *ptr)
+{
+	siglongjmp(sj_env, 1);
+}
+
 static int repair_msg(struct btt_chk *bttc)
 {
 	info(bttc, "  Run with --repair to make the changes\n");
@@ -872,6 +881,7 @@  int namespace_check(struct ndctl_namespace *ndns, struct check_opts *opts)
 	int raw_mode, rc, disabled_flag = 0, open_flags;
 	struct btt_sb *btt_sb;
 	struct btt_chk *bttc;
+	struct sigaction act;
 	char path[50];
 
 	bttc = calloc(1, sizeof(*bttc));
@@ -882,6 +892,15 @@  int namespace_check(struct ndctl_namespace *ndns, struct check_opts *opts)
 	if (opts->verbose)
 		bttc->ctx.log_priority = LOG_DEBUG;
 
+	memset(&act, 0, sizeof(act));
+	act.sa_sigaction = sigbus_hdl;
+	act.sa_flags = SA_SIGINFO;
+
+	if (sigaction(SIGBUS, &act, 0)) {
+		err(bttc, "Unable to set sigaction\n");
+		return -errno;
+	}
+
 	bttc->opts = opts;
 	bttc->start_off = BTT_START_OFFSET;
 	bttc->sys_page_size = sysconf(_SC_PAGESIZE);
@@ -949,6 +968,18 @@  int namespace_check(struct ndctl_namespace *ndns, struct check_opts *opts)
 		goto out_sb;
 	}
 
+	/*
+	 * This is where we jump to if we receive a SIGBUS, prior to doing any
+	 * mmaped reads, and can safely abort
+	 */
+	if (sigsetjmp(sj_env, 1)) {
+		err(bttc, "Received a SIGBUS\n");
+		err(bttc,
+			"Metadata corruption found, recovery is not possible\n");
+		rc = -EFAULT;
+		goto out_close;
+	}
+
 	rc = btt_info_read_verify(bttc, btt_sb, bttc->start_off);
 	if (rc) {
 		rc = btt_recover_first_sb(bttc);