diff mbox

nfit: add a module parameter to ignore ars errors

Message ID 1451946883-28092-1-git-send-email-vishal.l.verma@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Verma, Vishal L Jan. 4, 2016, 10:34 p.m. UTC
Normally, if a platform does not advertise support for Address Range
Scrub (ARS), we skip it. But if ARS is advertised, it is expected to
always succeed. If it fails, we normally fail initialization at that
point.

Add a module parameter to nfit that lets it ignore ARS failures and
continue with initialization for debugging.

Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---

This applies on top of both of the previous error handling series
(badblocks and libnvdimm poison list). The tree at:
https://git.kernel.org/cgit/linux/kernel/git/vishal/nvdimm.git/log/?h=err_handling_latest
has been updated with this patch.

 drivers/acpi/nfit.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Linda Knippers Jan. 6, 2016, 5:12 p.m. UTC | #1
On 1/4/2016 5:34 PM, Vishal Verma wrote:
> Normally, if a platform does not advertise support for Address Range
> Scrub (ARS), we skip it. But if ARS is advertised, it is expected to
> always succeed. If it fails, we normally fail initialization at that
> point.
> 
> Add a module parameter to nfit that lets it ignore ARS failures and
> continue with initialization for debugging.

Could ARS be so broken that you might want to just ignore it altogether
and not even make the requests?

-- ljk
> 
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
> 
> This applies on top of both of the previous error handling series
> (badblocks and libnvdimm poison list). The tree at:
> https://git.kernel.org/cgit/linux/kernel/git/vishal/nvdimm.git/log/?h=err_handling_latest
> has been updated with this patch.
> 
>  drivers/acpi/nfit.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
> index ad6d8c6..0a152f1 100644
> --- a/drivers/acpi/nfit.c
> +++ b/drivers/acpi/nfit.c
> @@ -34,6 +34,10 @@ static bool force_enable_dimms;
>  module_param(force_enable_dimms, bool, S_IRUGO|S_IWUSR);
>  MODULE_PARM_DESC(force_enable_dimms, "Ignore _STA (ACPI DIMM device) status");
>  
> +static bool ignore_ars;
> +module_param(ignore_ars, bool, S_IRUGO|S_IWUSR);
> +MODULE_PARM_DESC(ignore_ars, "Ignore ARS (Address Range Scrub) failures");
> +
>  struct nfit_table_prev {
>  	struct list_head spas;
>  	struct list_head memdevs;
> @@ -1786,7 +1790,10 @@ static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc,
>  			dev_err(acpi_desc->dev,
>  				"error while performing ARS to find poison: %d\n",
>  				rc);
> -			return rc;
> +			if (ignore_ars)
> +				; /* continue initialization */
> +			else
> +				return rc;
>  		}
>  		if (!nvdimm_pmem_region_create(nvdimm_bus, ndr_desc))
>  			return -ENOMEM;
>
Vishal Verma Jan. 7, 2016, 3:01 a.m. UTC | #2
On Wed, 2016-01-06 at 12:12 -0500, Linda Knippers wrote:
> On 1/4/2016 5:34 PM, Vishal Verma wrote:
> > Normally, if a platform does not advertise support for Address
> > Range
> > Scrub (ARS), we skip it. But if ARS is advertised, it is expected
> > to
> > always succeed. If it fails, we normally fail initialization at
> > that
> > point.
> > 
> > Add a module parameter to nfit that lets it ignore ARS failures and
> > continue with initialization for debugging.
> 
> Could ARS be so broken that you might want to just ignore it
> altogether
> and not even make the requests?
> 

That is a possibility, and I considered it, but I thought it might be
better to see how it fails and then just ignore the errors..
It boils down to how much we trust the firmware, and hopefully if it
advertises ARS as implemented, it should not be completely broken..

Dan, thoughts?
Dan Williams Jan. 7, 2016, 5:34 a.m. UTC | #3
On Wed, Jan 6, 2016 at 7:01 PM, Vishal Verma <vishal@kernel.org> wrote:
> On Wed, 2016-01-06 at 12:12 -0500, Linda Knippers wrote:
>> On 1/4/2016 5:34 PM, Vishal Verma wrote:
>> > Normally, if a platform does not advertise support for Address
>> > Range
>> > Scrub (ARS), we skip it. But if ARS is advertised, it is expected
>> > to
>> > always succeed. If it fails, we normally fail initialization at
>> > that
>> > point.
>> >
>> > Add a module parameter to nfit that lets it ignore ARS failures and
>> > continue with initialization for debugging.
>>
>> Could ARS be so broken that you might want to just ignore it
>> altogether
>> and not even make the requests?
>>
>
> That is a possibility, and I considered it, but I thought it might be
> better to see how it fails and then just ignore the errors..
> It boils down to how much we trust the firmware, and hopefully if it
> advertises ARS as implemented, it should not be completely broken..
>
> Dan, thoughts?
>

Hmm, once we add plumbing for bad block clearing / setting we'll have
the tools to workaround firmware with untrusted ars results.  i.e.
just manually correct false positive / negative entries.
Linda Knippers Jan. 7, 2016, 9:31 p.m. UTC | #4
On 1/7/2016 12:34 AM, Dan Williams wrote:
> On Wed, Jan 6, 2016 at 7:01 PM, Vishal Verma <vishal@kernel.org> wrote:
>> On Wed, 2016-01-06 at 12:12 -0500, Linda Knippers wrote:
>>> On 1/4/2016 5:34 PM, Vishal Verma wrote:
>>>> Normally, if a platform does not advertise support for Address
>>>> Range
>>>> Scrub (ARS), we skip it. But if ARS is advertised, it is expected
>>>> to
>>>> always succeed. If it fails, we normally fail initialization at
>>>> that
>>>> point.
>>>>
>>>> Add a module parameter to nfit that lets it ignore ARS failures and
>>>> continue with initialization for debugging.
>>>
>>> Could ARS be so broken that you might want to just ignore it
>>> altogether
>>> and not even make the requests?
>>>
>>
>> That is a possibility, and I considered it, but I thought it might be
>> better to see how it fails and then just ignore the errors..
>> It boils down to how much we trust the firmware, and hopefully if it
>> advertises ARS as implemented, it should not be completely broken..
>>
>> Dan, thoughts?
>>
> 
> Hmm, once we add plumbing for bad block clearing / setting we'll have
> the tools to workaround firmware with untrusted ars results.  i.e.
> just manually correct false positive / negative entries.

I was more worried about places where the code is looping waiting
for commands to complete and what happens with buggy firmware
but I've now commented on that patch.  Related to the parameter,
if we think we need to account for buggy firmware, we could be
vulnerable in more places this.

-- ljk

>
Dan Williams Jan. 7, 2016, 10:07 p.m. UTC | #5
On Thu, Jan 7, 2016 at 1:31 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
> On 1/7/2016 12:34 AM, Dan Williams wrote:
>> On Wed, Jan 6, 2016 at 7:01 PM, Vishal Verma <vishal@kernel.org> wrote:
>>> On Wed, 2016-01-06 at 12:12 -0500, Linda Knippers wrote:
>>>> On 1/4/2016 5:34 PM, Vishal Verma wrote:
>>>>> Normally, if a platform does not advertise support for Address
>>>>> Range
>>>>> Scrub (ARS), we skip it. But if ARS is advertised, it is expected
>>>>> to
>>>>> always succeed. If it fails, we normally fail initialization at
>>>>> that
>>>>> point.
>>>>>
>>>>> Add a module parameter to nfit that lets it ignore ARS failures and
>>>>> continue with initialization for debugging.
>>>>
>>>> Could ARS be so broken that you might want to just ignore it
>>>> altogether
>>>> and not even make the requests?
>>>>
>>>
>>> That is a possibility, and I considered it, but I thought it might be
>>> better to see how it fails and then just ignore the errors..
>>> It boils down to how much we trust the firmware, and hopefully if it
>>> advertises ARS as implemented, it should not be completely broken..
>>>
>>> Dan, thoughts?
>>>
>>
>> Hmm, once we add plumbing for bad block clearing / setting we'll have
>> the tools to workaround firmware with untrusted ars results.  i.e.
>> just manually correct false positive / negative entries.
>
> I was more worried about places where the code is looping waiting
> for commands to complete and what happens with buggy firmware
> but I've now commented on that patch.  Related to the parameter,
> if we think we need to account for buggy firmware, we could be
> vulnerable in more places this.

Yes, lets wait and see rather than pre-emptively working around
potentially bad firmware.
diff mbox

Patch

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index ad6d8c6..0a152f1 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -34,6 +34,10 @@  static bool force_enable_dimms;
 module_param(force_enable_dimms, bool, S_IRUGO|S_IWUSR);
 MODULE_PARM_DESC(force_enable_dimms, "Ignore _STA (ACPI DIMM device) status");
 
+static bool ignore_ars;
+module_param(ignore_ars, bool, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(ignore_ars, "Ignore ARS (Address Range Scrub) failures");
+
 struct nfit_table_prev {
 	struct list_head spas;
 	struct list_head memdevs;
@@ -1786,7 +1790,10 @@  static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc,
 			dev_err(acpi_desc->dev,
 				"error while performing ARS to find poison: %d\n",
 				rc);
-			return rc;
+			if (ignore_ars)
+				; /* continue initialization */
+			else
+				return rc;
 		}
 		if (!nvdimm_pmem_region_create(nvdimm_bus, ndr_desc))
 			return -ENOMEM;