Message ID | 1451946883-28092-1-git-send-email-vishal.l.verma@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 1/4/2016 5:34 PM, Vishal Verma wrote: > Normally, if a platform does not advertise support for Address Range > Scrub (ARS), we skip it. But if ARS is advertised, it is expected to > always succeed. If it fails, we normally fail initialization at that > point. > > Add a module parameter to nfit that lets it ignore ARS failures and > continue with initialization for debugging. Could ARS be so broken that you might want to just ignore it altogether and not even make the requests? -- ljk > > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> > --- > > This applies on top of both of the previous error handling series > (badblocks and libnvdimm poison list). The tree at: > https://git.kernel.org/cgit/linux/kernel/git/vishal/nvdimm.git/log/?h=err_handling_latest > has been updated with this patch. > > drivers/acpi/nfit.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c > index ad6d8c6..0a152f1 100644 > --- a/drivers/acpi/nfit.c > +++ b/drivers/acpi/nfit.c > @@ -34,6 +34,10 @@ static bool force_enable_dimms; > module_param(force_enable_dimms, bool, S_IRUGO|S_IWUSR); > MODULE_PARM_DESC(force_enable_dimms, "Ignore _STA (ACPI DIMM device) status"); > > +static bool ignore_ars; > +module_param(ignore_ars, bool, S_IRUGO|S_IWUSR); > +MODULE_PARM_DESC(ignore_ars, "Ignore ARS (Address Range Scrub) failures"); > + > struct nfit_table_prev { > struct list_head spas; > struct list_head memdevs; > @@ -1786,7 +1790,10 @@ static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc, > dev_err(acpi_desc->dev, > "error while performing ARS to find poison: %d\n", > rc); > - return rc; > + if (ignore_ars) > + ; /* continue initialization */ > + else > + return rc; > } > if (!nvdimm_pmem_region_create(nvdimm_bus, ndr_desc)) > return -ENOMEM; >
On Wed, 2016-01-06 at 12:12 -0500, Linda Knippers wrote: > On 1/4/2016 5:34 PM, Vishal Verma wrote: > > Normally, if a platform does not advertise support for Address > > Range > > Scrub (ARS), we skip it. But if ARS is advertised, it is expected > > to > > always succeed. If it fails, we normally fail initialization at > > that > > point. > > > > Add a module parameter to nfit that lets it ignore ARS failures and > > continue with initialization for debugging. > > Could ARS be so broken that you might want to just ignore it > altogether > and not even make the requests? > That is a possibility, and I considered it, but I thought it might be better to see how it fails and then just ignore the errors.. It boils down to how much we trust the firmware, and hopefully if it advertises ARS as implemented, it should not be completely broken.. Dan, thoughts?
On Wed, Jan 6, 2016 at 7:01 PM, Vishal Verma <vishal@kernel.org> wrote: > On Wed, 2016-01-06 at 12:12 -0500, Linda Knippers wrote: >> On 1/4/2016 5:34 PM, Vishal Verma wrote: >> > Normally, if a platform does not advertise support for Address >> > Range >> > Scrub (ARS), we skip it. But if ARS is advertised, it is expected >> > to >> > always succeed. If it fails, we normally fail initialization at >> > that >> > point. >> > >> > Add a module parameter to nfit that lets it ignore ARS failures and >> > continue with initialization for debugging. >> >> Could ARS be so broken that you might want to just ignore it >> altogether >> and not even make the requests? >> > > That is a possibility, and I considered it, but I thought it might be > better to see how it fails and then just ignore the errors.. > It boils down to how much we trust the firmware, and hopefully if it > advertises ARS as implemented, it should not be completely broken.. > > Dan, thoughts? > Hmm, once we add plumbing for bad block clearing / setting we'll have the tools to workaround firmware with untrusted ars results. i.e. just manually correct false positive / negative entries.
On 1/7/2016 12:34 AM, Dan Williams wrote: > On Wed, Jan 6, 2016 at 7:01 PM, Vishal Verma <vishal@kernel.org> wrote: >> On Wed, 2016-01-06 at 12:12 -0500, Linda Knippers wrote: >>> On 1/4/2016 5:34 PM, Vishal Verma wrote: >>>> Normally, if a platform does not advertise support for Address >>>> Range >>>> Scrub (ARS), we skip it. But if ARS is advertised, it is expected >>>> to >>>> always succeed. If it fails, we normally fail initialization at >>>> that >>>> point. >>>> >>>> Add a module parameter to nfit that lets it ignore ARS failures and >>>> continue with initialization for debugging. >>> >>> Could ARS be so broken that you might want to just ignore it >>> altogether >>> and not even make the requests? >>> >> >> That is a possibility, and I considered it, but I thought it might be >> better to see how it fails and then just ignore the errors.. >> It boils down to how much we trust the firmware, and hopefully if it >> advertises ARS as implemented, it should not be completely broken.. >> >> Dan, thoughts? >> > > Hmm, once we add plumbing for bad block clearing / setting we'll have > the tools to workaround firmware with untrusted ars results. i.e. > just manually correct false positive / negative entries. I was more worried about places where the code is looping waiting for commands to complete and what happens with buggy firmware but I've now commented on that patch. Related to the parameter, if we think we need to account for buggy firmware, we could be vulnerable in more places this. -- ljk >
On Thu, Jan 7, 2016 at 1:31 PM, Linda Knippers <linda.knippers@hpe.com> wrote: > On 1/7/2016 12:34 AM, Dan Williams wrote: >> On Wed, Jan 6, 2016 at 7:01 PM, Vishal Verma <vishal@kernel.org> wrote: >>> On Wed, 2016-01-06 at 12:12 -0500, Linda Knippers wrote: >>>> On 1/4/2016 5:34 PM, Vishal Verma wrote: >>>>> Normally, if a platform does not advertise support for Address >>>>> Range >>>>> Scrub (ARS), we skip it. But if ARS is advertised, it is expected >>>>> to >>>>> always succeed. If it fails, we normally fail initialization at >>>>> that >>>>> point. >>>>> >>>>> Add a module parameter to nfit that lets it ignore ARS failures and >>>>> continue with initialization for debugging. >>>> >>>> Could ARS be so broken that you might want to just ignore it >>>> altogether >>>> and not even make the requests? >>>> >>> >>> That is a possibility, and I considered it, but I thought it might be >>> better to see how it fails and then just ignore the errors.. >>> It boils down to how much we trust the firmware, and hopefully if it >>> advertises ARS as implemented, it should not be completely broken.. >>> >>> Dan, thoughts? >>> >> >> Hmm, once we add plumbing for bad block clearing / setting we'll have >> the tools to workaround firmware with untrusted ars results. i.e. >> just manually correct false positive / negative entries. > > I was more worried about places where the code is looping waiting > for commands to complete and what happens with buggy firmware > but I've now commented on that patch. Related to the parameter, > if we think we need to account for buggy firmware, we could be > vulnerable in more places this. Yes, lets wait and see rather than pre-emptively working around potentially bad firmware.
diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c index ad6d8c6..0a152f1 100644 --- a/drivers/acpi/nfit.c +++ b/drivers/acpi/nfit.c @@ -34,6 +34,10 @@ static bool force_enable_dimms; module_param(force_enable_dimms, bool, S_IRUGO|S_IWUSR); MODULE_PARM_DESC(force_enable_dimms, "Ignore _STA (ACPI DIMM device) status"); +static bool ignore_ars; +module_param(ignore_ars, bool, S_IRUGO|S_IWUSR); +MODULE_PARM_DESC(ignore_ars, "Ignore ARS (Address Range Scrub) failures"); + struct nfit_table_prev { struct list_head spas; struct list_head memdevs; @@ -1786,7 +1790,10 @@ static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc, dev_err(acpi_desc->dev, "error while performing ARS to find poison: %d\n", rc); - return rc; + if (ignore_ars) + ; /* continue initialization */ + else + return rc; } if (!nvdimm_pmem_region_create(nvdimm_bus, ndr_desc)) return -ENOMEM;
Normally, if a platform does not advertise support for Address Range Scrub (ARS), we skip it. But if ARS is advertised, it is expected to always succeed. If it fails, we normally fail initialization at that point. Add a module parameter to nfit that lets it ignore ARS failures and continue with initialization for debugging. Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> --- This applies on top of both of the previous error handling series (badblocks and libnvdimm poison list). The tree at: https://git.kernel.org/cgit/linux/kernel/git/vishal/nvdimm.git/log/?h=err_handling_latest has been updated with this patch. drivers/acpi/nfit.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)