Message ID | 20170414000251.23904-1-vishal.l.verma@intel.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 116da6f54476 |
Headers | show |
> If we have poison/badblocks in the BTT metadata sections, the mmap-reads > happening in the checker will trigger a SIGBUS, and the program will > halt abruptly. Add a sigbus handler which notifies the user of this, and > prints out a relevant error message: > > namespace5.0: namespace_check: checking namespace5.0 > namespace5.0: btt_discover_arenas: found 1 BTT arena > namespace5.0: btt_check_arenas: checking arena 0 > namespace5.0: namespace_check: Received a SIGBUS > namespace5.0: namespace_check: Metadata corruption found, recovery is not possible > error checking namespaces: Bad address If in repair mode, should a SIGBUS cause you to set the error state in the associated arena? -andy
On Fri, 2017-04-14 at 00:10 +0000, Rudoff, Andy wrote: > > If we have poison/badblocks in the BTT metadata sections, the mmap- > > reads > > happening in the checker will trigger a SIGBUS, and the program > > will > > halt abruptly. Add a sigbus handler which notifies the user of > > this, and > > prints out a relevant error message: > > > > namespace5.0: namespace_check: checking namespace5.0 > > namespace5.0: btt_discover_arenas: found 1 BTT arena > > namespace5.0: btt_check_arenas: checking arena 0 > > namespace5.0: namespace_check: Received a SIGBUS > > namespace5.0: namespace_check: Metadata corruption found, recovery > > is not possible > > error checking namespaces: Bad address > > If in repair mode, should a SIGBUS cause you to set the error state > in the associated arena? > > -andy > So I thought a bit about this - While we can set the error state, that leaves the BTT with no way to get any data off it (currently). If we allow the kernel to enable the namespace, the user has some chance to do at least partial recovery until they hit the badblock. (assuming the badblock is in the map, which is the largest on-disk structure). If it is on the flog or info block, then, yes, there is no recovery. -Vishal
On Fri, Apr 14, 2017 at 12:00 PM, Verma, Vishal L <vishal.l.verma@intel.com> wrote: > On Fri, 2017-04-14 at 00:10 +0000, Rudoff, Andy wrote: >> > If we have poison/badblocks in the BTT metadata sections, the mmap- >> > reads >> > happening in the checker will trigger a SIGBUS, and the program >> > will >> > halt abruptly. Add a sigbus handler which notifies the user of >> > this, and >> > prints out a relevant error message: >> > >> > namespace5.0: namespace_check: checking namespace5.0 >> > namespace5.0: btt_discover_arenas: found 1 BTT arena >> > namespace5.0: btt_check_arenas: checking arena 0 >> > namespace5.0: namespace_check: Received a SIGBUS >> > namespace5.0: namespace_check: Metadata corruption found, recovery >> > is not possible >> > error checking namespaces: Bad address >> >> If in repair mode, should a SIGBUS cause you to set the error state >> in the associated arena? >> >> -andy >> > So I thought a bit about this - > While we can set the error state, that leaves the BTT with no way to > get any data off it (currently). > > If we allow the kernel to enable the namespace, the user has some > chance to do at least partial recovery until they hit the badblock. > (assuming the badblock is in the map, which is the largest on-disk > structure). If it is on the flog or info block, then, yes, there is no > recovery. Also, the kernel currently does nothing with the error bit besides print out that it saw it.
On Fri, Apr 14, 2017 at 12:00 PM, Verma, Vishal L <vishal.l.verma@intel.com> wrote: > On Fri, 2017-04-14 at 00:10 +0000, Rudoff, Andy wrote: >> > If we have poison/badblocks in the BTT metadata sections, the mmap- >> > reads >> > happening in the checker will trigger a SIGBUS, and the program >> > will >> > halt abruptly. Add a sigbus handler which notifies the user of >> > this, and >> > prints out a relevant error message: >> > >> > namespace5.0: namespace_check: checking namespace5.0 >> > namespace5.0: btt_discover_arenas: found 1 BTT arena >> > namespace5.0: btt_check_arenas: checking arena 0 >> > namespace5.0: namespace_check: Received a SIGBUS >> > namespace5.0: namespace_check: Metadata corruption found, recovery >> > is not possible >> > error checking namespaces: Bad address >> >> If in repair mode, should a SIGBUS cause you to set the error state >> in the associated arena? >> >> -andy >> > So I thought a bit about this - > While we can set the error state, that leaves the BTT with no way to > get any data off it (currently). > > If we allow the kernel to enable the namespace, the user has some > chance to do at least partial recovery until they hit the badblock. > (assuming the badblock is in the map, which is the largest on-disk > structure). If it is on the flog or info block, then, yes, there is no > recovery. Yeah, I thought about this a little more too. Here’s what I think should happen: Without --repair, I agree with this patch, although the error message might tell the poor user that running with --repair is the next step if they want to try to fix it (see below). With --repair, the repair code should try to replace the poison with repaired metadata, clearing the appropriate error bits when done. For example, if the poison is in the map, the repair code should figure out which entries have no map entry, clear the poison, and update the map, printing a strong warning that the blocks were repaired but may not be in the right place any more. If the poison is in something like the arena info block, a replacement can be constructed from the backup info. The point is that the tool shouldn’t die on poison, it should be the way you fix it and get rid of the poison. Finally, the kernel should be setting and honoring the error bits, both in map entries and arena info blocks. When poison is discovered in metadata, the arena info block error state should be set and that should not disallow getting to any available data as you stated, but should instead make that arena read-only. The only way to get out of read-only mode for the arena should be to run the check tool with --repair and all the stuff I said above then happens.
On Fri, 2017-04-14 at 19:26 +0000, Rudoff, Andy wrote: > On Fri, Apr 14, 2017 at 12:00 PM, Verma, Vishal L [...] > > Yeah, I thought about this a little more too. Here’s what I think > should happen: > > Without --repair, I agree with this patch, although the error message > might > tell the poor user that running with --repair is the next step if > they want to > try to fix it (see below). > > With --repair, the repair code should try to replace the poison with > repaired metadata, clearing the appropriate error bits when done. > > For example, if the poison is in the map, the repair code should > figure > out which entries have no map entry, clear the poison, and update the > map, printing a strong warning that the blocks were repaired but may > not be in the right place any more. If we hit a known badblock, that is 512B worth of map entries (128). Should we really (almost certainly) scramble 64 blocks? :) If it is a latent error, it will still be at least a cache line worth of map entries, i.e. 16. If an error is in the log, then in the badblock case, we lose both log and log' for four lanes. Which means we can't tell is one of those 4 entries needed a map update, leaving a potential corruption window open. > If the poison is in something like > the arena info block, a replacement can be constructed from the > backup info. The point is that the tool shouldn’t die on poison, it > should be the way you fix it and get rid of the poison. This might be the only case where we can clear the poison successfully.. > > Finally, the kernel should be setting and honoring the error bits, > both > in map entries and arena info blocks. When poison is discovered in > metadata, the arena info block error state should be set and that > should not disallow getting to any available data as you stated, but > should instead make that arena read-only. Agreed about the error bit and read-only, I will look at that. > The only way to get out > of read-only mode for the arena should be to run the check tool > with --repair and all the stuff I said above then happens. > >
>If we hit a known badblock, that is 512B worth of map entries (128). >Should we really (almost certainly) scramble 64 blocks? :) >If it is a latent error, it will still be at least a cache line worth >of map entries, i.e. 16. > >If an error is in the log, then in the badblock case, we lose both log >and log' for four lanes. Which means we can't tell is one of those 4 >entries needed a map update, leaving a potential corruption window >open. Before responding to this, can you walk me through the steps a user is expected to take if poison appears the flog, for example? Say I just want to copy out as much data as I can get, then recreate a fresh BTT. I think you’re telling me that a poisoned flog prevents me from using the entire device, but maybe I’m wrong and you’re saying I can still get to the other arenas? Anyway, after copying out what I can, how do I recreate the BTT? When the BTT creation code starts writing the new flog, will that clear the poison? Does the BTT creation code do big enough writes so the driver will issue Clear Uncorrectable commands? With your answers, I think we can create a typical user flow, where the user gets error messages and tries to use our tools to fix things. That should tell us if we’re on the right track or if we’re stranding the poor user with no way to recover… Thanks, -andy
On Fri, 2017-04-14 at 19:52 +0000, Rudoff, Andy wrote: > >If we hit a known badblock, that is 512B worth of map entries > (128). > >Should we really (almost certainly) scramble 64 blocks? :) > >If it is a latent error, it will still be at least a cache line > worth > >of map entries, i.e. 16. > > > >If an error is in the log, then in the badblock case, we lose > both log > >and log' for four lanes. Which means we can't tell is one of > those 4 > >entries needed a map update, leaving a potential corruption > window > >open. > > Before responding to this, can you walk me through the steps > a user is expected to take if poison appears the flog, for example? > Say I just want to copy out as much data as I can get, then recreate > a fresh BTT. I think you’re telling me that a poisoned flog prevents > me from using the entire device, but maybe I’m wrong and you’re > saying I can still get to the other arenas? So I think currently the best way is to recreate the namespace and restore from a backup.. I wonder if we have a problem hiding even here, but more on that below. I don't think all access will be prevented, even to that arena, and certainly not the whole device. My point was only that we can't clear the poison, because then all notification of "something went wrong" is lost. One option could be to clear the bock of poisoned map entries, and replace them with 'error' entries (the error bit in each map entry). That way reads will continue to fail, and writes to those entries will fix them. I'll have to work through it and make sure this works on the kernel side first.. > > Anyway, after copying out what I can, how do I recreate the BTT? > When the BTT creation code starts writing the new flog, will that > clear the poison? Does the BTT creation code do big enough writes > so the driver will issue Clear Uncorrectable commands? After 4.12, we will at least have the ability to clear poison for data blocks. i.e. when the requests going through rw_bytes are aligned and sized to 512B sectors. Map and flog writes are not large enough, and so won't clear errors.. And maybe this is something we need to fix otherwise. Consider a scenario where we had metadata poison, we delete and recreate the namespace. This in itself doesn't hit the poison. We will end up hitting CMCIs when we write metaddata and Uncorrectables on reads again since the poison wasn't cleared. Maybe what we need here is during BTT creation, check for at least known badblocks, and initialize them so that we know at least new BTTs are clean. > > With your answers, I think we can create a typical user flow, > where the user gets error messages and tries to use our tools > to fix things. That should tell us if we’re on the right track > or if we’re stranding the poor user with no way to recover… Agreed, current recovery path (from poison) sure involves slaying dragons :) > > Thanks, > > -andy > >
>Maybe what we need here is during BTT creation, check for at least >known badblocks, and initialize them so that we know at least new BTTs >are clean. I totally agree with this one. Sounds like a must-have to me. The idea of poison in a location that sticks around even after you recreate the BTT makes me shudder. Thanks, -andy
Vishal Verma <vishal.l.verma@intel.com> writes: > If we have poison/badblocks in the BTT metadata sections, the mmap-reads > happening in the checker will trigger a SIGBUS, and the program will > halt abruptly. Add a sigbus handler which notifies the user of this, and > prints out a relevant error message: This isn't much better than just segfaulting. Did you consider whether it was possible to reconstruct corrupt metadata in any circumstances? -Jeff > > namespace5.0: namespace_check: checking namespace5.0 > namespace5.0: btt_discover_arenas: found 1 BTT arena > namespace5.0: btt_check_arenas: checking arena 0 > namespace5.0: namespace_check: Received a SIGBUS > namespace5.0: namespace_check: Metadata corruption found, recovery is not possible > error checking namespaces: Bad address > > Cc: Dan Williams <dan.j.williams@intel.com> > Reported-by: Jeff Moyer <jmoyer@redhat.com> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> > --- > ndctl/check.c | 31 +++++++++++++++++++++++++++++++ > 1 file changed, 31 insertions(+) > > diff --git a/ndctl/check.c b/ndctl/check.c > index 3b30a98..3775c2e 100644 > --- a/ndctl/check.c > +++ b/ndctl/check.c > @@ -13,6 +13,8 @@ > #include <stdio.h> > #include <fcntl.h> > #include <errno.h> > +#include <setjmp.h> > +#include <signal.h> > #include <stdint.h> > #include <stdlib.h> > #include <unistd.h> > @@ -40,6 +42,13 @@ > #include <ndctl.h> > #endif > > +static sigjmp_buf sj_env; > + > +static void sigbus_hdl(int sig, siginfo_t *siginfo, void *ptr) > +{ > + siglongjmp(sj_env, 1); > +} > + > static int repair_msg(struct btt_chk *bttc) > { > info(bttc, " Run with --repair to make the changes\n"); > @@ -872,6 +881,7 @@ int namespace_check(struct ndctl_namespace *ndns, struct check_opts *opts) > int raw_mode, rc, disabled_flag = 0, open_flags; > struct btt_sb *btt_sb; > struct btt_chk *bttc; > + struct sigaction act; > char path[50]; > > bttc = calloc(1, sizeof(*bttc)); > @@ -882,6 +892,15 @@ int namespace_check(struct ndctl_namespace *ndns, struct check_opts *opts) > if (opts->verbose) > bttc->ctx.log_priority = LOG_DEBUG; > > + memset(&act, 0, sizeof(act)); > + act.sa_sigaction = sigbus_hdl; > + act.sa_flags = SA_SIGINFO; > + > + if (sigaction(SIGBUS, &act, 0)) { > + err(bttc, "Unable to set sigaction\n"); > + return -errno; > + } > + > bttc->opts = opts; > bttc->start_off = BTT_START_OFFSET; > bttc->sys_page_size = sysconf(_SC_PAGESIZE); > @@ -949,6 +968,18 @@ int namespace_check(struct ndctl_namespace *ndns, struct check_opts *opts) > goto out_sb; > } > > + /* > + * This is where we jump to if we receive a SIGBUS, prior to doing any > + * mmaped reads, and can safely abort > + */ > + if (sigsetjmp(sj_env, 1)) { > + err(bttc, "Received a SIGBUS\n"); > + err(bttc, > + "Metadata corruption found, recovery is not possible\n"); > + rc = -EFAULT; > + goto out_close; > + } > + > rc = btt_info_read_verify(bttc, btt_sb, bttc->start_off); > if (rc) { > rc = btt_recover_first_sb(bttc);
On Mon, 2017-04-17 at 11:37 -0400, Jeff Moyer wrote: > Vishal Verma <vishal.l.verma@intel.com> writes: > > > If we have poison/badblocks in the BTT metadata sections, the mmap- > > reads > > happening in the checker will trigger a SIGBUS, and the program > > will > > halt abruptly. Add a sigbus handler which notifies the user of > > this, and > > prints out a relevant error message: > > This isn't much better than just segfaulting. Did you consider > whether > it was possible to reconstruct corrupt metadata in any circumstances? > The only spot we can actually correct metadata would be for an info block, if the copy is intact. For any other poison in the map or flog, I don't think we have any way to recover. Was that (info blocks) what you meant, or something else?
diff --git a/ndctl/check.c b/ndctl/check.c index 3b30a98..3775c2e 100644 --- a/ndctl/check.c +++ b/ndctl/check.c @@ -13,6 +13,8 @@ #include <stdio.h> #include <fcntl.h> #include <errno.h> +#include <setjmp.h> +#include <signal.h> #include <stdint.h> #include <stdlib.h> #include <unistd.h> @@ -40,6 +42,13 @@ #include <ndctl.h> #endif +static sigjmp_buf sj_env; + +static void sigbus_hdl(int sig, siginfo_t *siginfo, void *ptr) +{ + siglongjmp(sj_env, 1); +} + static int repair_msg(struct btt_chk *bttc) { info(bttc, " Run with --repair to make the changes\n"); @@ -872,6 +881,7 @@ int namespace_check(struct ndctl_namespace *ndns, struct check_opts *opts) int raw_mode, rc, disabled_flag = 0, open_flags; struct btt_sb *btt_sb; struct btt_chk *bttc; + struct sigaction act; char path[50]; bttc = calloc(1, sizeof(*bttc)); @@ -882,6 +892,15 @@ int namespace_check(struct ndctl_namespace *ndns, struct check_opts *opts) if (opts->verbose) bttc->ctx.log_priority = LOG_DEBUG; + memset(&act, 0, sizeof(act)); + act.sa_sigaction = sigbus_hdl; + act.sa_flags = SA_SIGINFO; + + if (sigaction(SIGBUS, &act, 0)) { + err(bttc, "Unable to set sigaction\n"); + return -errno; + } + bttc->opts = opts; bttc->start_off = BTT_START_OFFSET; bttc->sys_page_size = sysconf(_SC_PAGESIZE); @@ -949,6 +968,18 @@ int namespace_check(struct ndctl_namespace *ndns, struct check_opts *opts) goto out_sb; } + /* + * This is where we jump to if we receive a SIGBUS, prior to doing any + * mmaped reads, and can safely abort + */ + if (sigsetjmp(sj_env, 1)) { + err(bttc, "Received a SIGBUS\n"); + err(bttc, + "Metadata corruption found, recovery is not possible\n"); + rc = -EFAULT; + goto out_close; + } + rc = btt_info_read_verify(bttc, btt_sb, bttc->start_off); if (rc) { rc = btt_recover_first_sb(bttc);
If we have poison/badblocks in the BTT metadata sections, the mmap-reads happening in the checker will trigger a SIGBUS, and the program will halt abruptly. Add a sigbus handler which notifies the user of this, and prints out a relevant error message: namespace5.0: namespace_check: checking namespace5.0 namespace5.0: btt_discover_arenas: found 1 BTT arena namespace5.0: btt_check_arenas: checking arena 0 namespace5.0: namespace_check: Received a SIGBUS namespace5.0: namespace_check: Metadata corruption found, recovery is not possible error checking namespaces: Bad address Cc: Dan Williams <dan.j.williams@intel.com> Reported-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> --- ndctl/check.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+)