diff mbox

libnvdimm, pmem: fix badblocks notification crash

Message ID 149333101097.4714.1923436715100717938.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dan Williams April 27, 2017, 10:10 p.m. UTC
The nd_pmem_notify() routine is called whenever an ARS
(address-range-scrub) completes to communicate results to the
per-namespace badblocks instances.

When the namespace is in btt mode we crash because we do not allocate a
struct pmem_device instance in that case. Resulting in the following
crash signature:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
 IP: nd_pmem_notify+0x30/0xf0 [nd_pmem]
 Call Trace:
  nd_device_notify+0x40/0x50
  child_notify+0x10/0x20
  device_for_each_child+0x50/0x90
  nd_region_notify+0x20/0x30
  nd_device_notify+0x40/0x50
  nvdimm_region_notify+0x27/0x30
  acpi_nfit_scrub+0x341/0x590 [nfit]
  process_one_work+0x197/0x450
  worker_thread+0x4e/0x4a0
  kthread+0x109/0x140

Given that we don't even populate the btt badblocks instance, just
return early and skip the device to region lookup.

This is a simpler version of the original fix by Toshi [1].

[1]: https://patchwork.kernel.org/patch/9700055/

Cc: Vishal Verma <vishal.l.verma@intel.com>
Reported-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/pmem.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

Comments

Kani, Toshi April 27, 2017, 10:25 p.m. UTC | #1
On Thu, 2017-04-27 at 15:10 -0700, Dan Williams wrote:
> The nd_pmem_notify() routine is called whenever an ARS

> (address-range-scrub) completes to communicate results to the

> per-namespace badblocks instances.

> 

> When the namespace is in btt mode we crash because we do not allocate

> a struct pmem_device instance in that case. Resulting in the

> following crash signature:

> 

>  BUG: unable to handle kernel NULL pointer dereference at

> 0000000000000030

>  IP: nd_pmem_notify+0x30/0xf0 [nd_pmem]

>  Call Trace:

>   nd_device_notify+0x40/0x50

>   child_notify+0x10/0x20

>   device_for_each_child+0x50/0x90

>   nd_region_notify+0x20/0x30

>   nd_device_notify+0x40/0x50

>   nvdimm_region_notify+0x27/0x30

>   acpi_nfit_scrub+0x341/0x590 [nfit]

>   process_one_work+0x197/0x450

>   worker_thread+0x4e/0x4a0

>   kthread+0x109/0x140

> 

> Given that we don't even populate the btt badblocks instance, just

> return early and skip the device to region lookup.


We populate the btt badblocks into nsio->bb, and check/clear them in
nsio_rw_bytes().

Thanks,
-Toshi
Dan Williams April 27, 2017, 10:26 p.m. UTC | #2
On Thu, Apr 27, 2017 at 3:25 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
> On Thu, 2017-04-27 at 15:10 -0700, Dan Williams wrote:
>> The nd_pmem_notify() routine is called whenever an ARS
>> (address-range-scrub) completes to communicate results to the
>> per-namespace badblocks instances.
>>
>> When the namespace is in btt mode we crash because we do not allocate
>> a struct pmem_device instance in that case. Resulting in the
>> following crash signature:
>>
>>  BUG: unable to handle kernel NULL pointer dereference at
>> 0000000000000030
>>  IP: nd_pmem_notify+0x30/0xf0 [nd_pmem]
>>  Call Trace:
>>   nd_device_notify+0x40/0x50
>>   child_notify+0x10/0x20
>>   device_for_each_child+0x50/0x90
>>   nd_region_notify+0x20/0x30
>>   nd_device_notify+0x40/0x50
>>   nvdimm_region_notify+0x27/0x30
>>   acpi_nfit_scrub+0x341/0x590 [nfit]
>>   process_one_work+0x197/0x450
>>   worker_thread+0x4e/0x4a0
>>   kthread+0x109/0x140
>>
>> Given that we don't even populate the btt badblocks instance, just
>> return early and skip the device to region lookup.
>
> We populate the btt badblocks into nsio->bb, and check/clear them in
> nsio_rw_bytes().

Argh, yes, we don't populate them out to the disk badblocks. I'll go
with your patch.
Kani, Toshi April 27, 2017, 10:28 p.m. UTC | #3
On Thu, 2017-04-27 at 15:26 -0700, Dan Williams wrote:
> On Thu, Apr 27, 2017 at 3:25 PM, Kani, Toshimitsu <toshi.kani@hpe.com

> > wrote:

> > On Thu, 2017-04-27 at 15:10 -0700, Dan Williams wrote:

> > > The nd_pmem_notify() routine is called whenever an ARS

> > > (address-range-scrub) completes to communicate results to the

> > > per-namespace badblocks instances.

> > > 

> > > When the namespace is in btt mode we crash because we do not

> > > allocate a struct pmem_device instance in that case. Resulting in

> > > the following crash signature:

> > > 

> > >  BUG: unable to handle kernel NULL pointer dereference at

> > > 0000000000000030

> > >  IP: nd_pmem_notify+0x30/0xf0 [nd_pmem]

> > >  Call Trace:

> > >   nd_device_notify+0x40/0x50

> > >   child_notify+0x10/0x20

> > >   device_for_each_child+0x50/0x90

> > >   nd_region_notify+0x20/0x30

> > >   nd_device_notify+0x40/0x50

> > >   nvdimm_region_notify+0x27/0x30

> > >   acpi_nfit_scrub+0x341/0x590 [nfit]

> > >   process_one_work+0x197/0x450

> > >   worker_thread+0x4e/0x4a0

> > >   kthread+0x109/0x140

> > > 

> > > Given that we don't even populate the btt badblocks instance,

> > > just return early and skip the device to region lookup.

> > 

> > We populate the btt badblocks into nsio->bb, and check/clear them

> > in nsio_rw_bytes().

> 

> Argh, yes, we don't populate them out to the disk badblocks. I'll go

> with your patch.


Thanks!
-Toshi
diff mbox

Patch

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 5b536be5a12e..ee6cd31dafcf 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -388,21 +388,21 @@  static void nd_pmem_shutdown(struct device *dev)
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
 {
-	struct pmem_device *pmem = dev_get_drvdata(dev);
-	struct nd_region *nd_region = to_region(pmem);
 	resource_size_t offset = 0, end_trunc = 0;
 	struct nd_namespace_common *ndns;
 	struct nd_namespace_io *nsio;
+	struct nd_region *nd_region;
+	struct pmem_device *pmem;
 	struct resource res;
 
 	if (event != NVDIMM_REVALIDATE_POISON)
 		return;
 
-	if (is_nd_btt(dev)) {
-		struct nd_btt *nd_btt = to_nd_btt(dev);
+	/* no badblocks instance to update in the btt case */
+	if (is_nd_btt(dev))
+		return;
 
-		ndns = nd_btt->ndns;
-	} else if (is_nd_pfn(dev)) {
+	if (is_nd_pfn(dev)) {
 		struct nd_pfn *nd_pfn = to_nd_pfn(dev);
 		struct nd_pfn_sb *pfn_sb = nd_pfn->pfn_sb;
 
@@ -415,6 +415,8 @@  static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
 	nsio = to_nd_namespace_io(&ndns->dev);
 	res.start = nsio->res.start + offset;
 	res.end = nsio->res.end - end_trunc;
+	pmem = dev_get_drvdata(dev);
+	nd_region = to_region(pmem);
 	nvdimm_badblocks_populate(nd_region, &pmem->bb, &res);
 }