diff mbox series

nvdimm: nvdimm_bus_register: Avoid adding device to the unregistered bus

Message ID 1678955931-2-1-git-send-email-lizhijian@fujitsu.com (mailing list archive)
State New, archived
Headers show
Series nvdimm: nvdimm_bus_register: Avoid adding device to the unregistered bus | expand

Commit Message

Zhijian Li (Fujitsu) March 16, 2023, 8:38 a.m. UTC
nvdimm_bus_register() could be called from other modules, such as nfit,
but it can only be called after the nvdimm_bus_type is registered.

 BUG: kernel NULL pointer dereference, address: 0000000000000098
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
 RIP: 0010:bus_add_device+0x58/0x150
 Call Trace:
  <TASK>
  device_add+0x3ac/0x980
  nvdimm_bus_register+0x16d/0x1d0
  acpi_nfit_init+0xb72/0x1f90 [nfit]
  acpi_nfit_add+0x1d5/0x200 [nfit]
  acpi_device_probe+0x45/0x160
  really_probe+0xce/0x390
  __driver_probe_device+0x78/0x180
  driver_probe_device+0x1e/0x90
  __driver_attach+0xd6/0x1d0
  bus_for_each_dev+0x7b/0xc0
  bus_add_driver+0x1ac/0x200
  driver_register+0x8f/0xf0
  nfit_init+0x164/0xff0 [nfit]
  do_one_initcall+0x5b/0x320
  do_init_module+0x4c/0x1f0
  __do_sys_finit_module+0xb4/0x130
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x72/0xdc

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 drivers/nvdimm/bus.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Dan Williams March 16, 2023, 3:54 p.m. UTC | #1
Li Zhijian wrote:
> nvdimm_bus_register() could be called from other modules, such as nfit,
> but it can only be called after the nvdimm_bus_type is registered.
> 
>  BUG: kernel NULL pointer dereference, address: 0000000000000098
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x0000) - not-present page
>  PGD 0 P4D 0
>  Oops: 0000 [#1] PREEMPT SMP PTI
>  CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
>  RIP: 0010:bus_add_device+0x58/0x150
>  Call Trace:
>   <TASK>
>   device_add+0x3ac/0x980
>   nvdimm_bus_register+0x16d/0x1d0
>   acpi_nfit_init+0xb72/0x1f90 [nfit]
>   acpi_nfit_add+0x1d5/0x200 [nfit]
>   acpi_device_probe+0x45/0x160

Can you explain a bit more how to hit this crash? This has not been a
problem historically and the explanation above makes it sound like this
is a theoretical issue.

libnvdimm_init() *should* be done before the nfit driver can attempt
nvdimm_bus_register(). So, something else is broken if
nvdimm_bus_register() can be called before libnvdimm_init(), or after
libnvdimm_exit().
Zhijian Li (Fujitsu) March 17, 2023, 1:42 a.m. UTC | #2
On 16/03/2023 23:54, Dan Williams wrote:
> Li Zhijian wrote:
>> nvdimm_bus_register() could be called from other modules, such as nfit,
>> but it can only be called after the nvdimm_bus_type is registered.
>>
>>   BUG: kernel NULL pointer dereference, address: 0000000000000098
>>   #PF: supervisor read access in kernel mode
>>   #PF: error_code(0x0000) - not-present page
>>   PGD 0 P4D 0
>>   Oops: 0000 [#1] PREEMPT SMP PTI
>>   CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97
>>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
>>   RIP: 0010:bus_add_device+0x58/0x150
>>   Call Trace:
>>    <TASK>
>>    device_add+0x3ac/0x980
>>    nvdimm_bus_register+0x16d/0x1d0
>>    acpi_nfit_init+0xb72/0x1f90 [nfit]
>>    acpi_nfit_add+0x1d5/0x200 [nfit]
>>    acpi_device_probe+0x45/0x160
> 
> Can you explain a bit more how to hit this crash? This has not been a
> problem historically and the explanation above makes it sound like this
> is a theoretical issue.
> 

Dan,

Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter
'initcall_blacklist=libnvdimm_init'. Then kernel panic!
Theoretically, it will also happen if nvdimm_bus_register() failed.


For kdump purpose[1], we need to disable libnvdimm driver to ensure metadata in pmem will not be updated again in kdump kernel
[1]https://lore.kernel.org/linux-mm/3c752fc2-b6a0-2975-ffec-dba3edcf4155@fujitsu.com/T/

Thanks
Zhijian

> libnvdimm_init() *should* be done before the nfit driver can attempt
> nvdimm_bus_register(). So, something else is broken if
> nvdimm_bus_register() can be called before libnvdimm_init(), or after
> libnvdimm_exit().
Dan Williams March 17, 2023, 2:26 a.m. UTC | #3
lizhijian@fujitsu.com wrote:
> 
> 
> On 16/03/2023 23:54, Dan Williams wrote:
> > Li Zhijian wrote:
> >> nvdimm_bus_register() could be called from other modules, such as nfit,
> >> but it can only be called after the nvdimm_bus_type is registered.
> >>
> >>   BUG: kernel NULL pointer dereference, address: 0000000000000098
> >>   #PF: supervisor read access in kernel mode
> >>   #PF: error_code(0x0000) - not-present page
> >>   PGD 0 P4D 0
> >>   Oops: 0000 [#1] PREEMPT SMP PTI
> >>   CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97
> >>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> >>   RIP: 0010:bus_add_device+0x58/0x150
> >>   Call Trace:
> >>    <TASK>
> >>    device_add+0x3ac/0x980
> >>    nvdimm_bus_register+0x16d/0x1d0
> >>    acpi_nfit_init+0xb72/0x1f90 [nfit]
> >>    acpi_nfit_add+0x1d5/0x200 [nfit]
> >>    acpi_device_probe+0x45/0x160
> > 
> > Can you explain a bit more how to hit this crash? This has not been a
> > problem historically and the explanation above makes it sound like this
> > is a theoretical issue.
> > 
> 
> Dan,
> 
> Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter
> 'initcall_blacklist=libnvdimm_init'. Then kernel panic!
> Theoretically, it will also happen if nvdimm_bus_register() failed.
> 
> 
> For kdump purpose[1], we need to disable libnvdimm driver to ensure metadata in pmem will not be updated again in kdump kernel
> [1]https://lore.kernel.org/linux-mm/3c752fc2-b6a0-2975-ffec-dba3edcf4155@fujitsu.com/T/

Ah, great write up! Let me give that some thought. Apologies for missing
it earlier.

This would have been a good use for:

Link: https://lore.kernel.org/linux-mm/3c752fc2-b6a0-2975-ffec-dba3edcf4155@fujitsu.com

...in the above changelog.
Zhijian Li (Fujitsu) March 17, 2023, 5:37 a.m. UTC | #4
On 17/03/2023 10:26, Dan Williams wrote:
> lizhijian@fujitsu.com wrote:
>>
>>
>> On 16/03/2023 23:54, Dan Williams wrote:
>>> Li Zhijian wrote:
>>>> nvdimm_bus_register() could be called from other modules, such as nfit,
>>>> but it can only be called after the nvdimm_bus_type is registered.
>>>>
>>>>    BUG: kernel NULL pointer dereference, address: 0000000000000098
>>>>    #PF: supervisor read access in kernel mode
>>>>    #PF: error_code(0x0000) - not-present page
>>>>    PGD 0 P4D 0
>>>>    Oops: 0000 [#1] PREEMPT SMP PTI
>>>>    CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97
>>>>    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
>>>>    RIP: 0010:bus_add_device+0x58/0x150
>>>>    Call Trace:
>>>>     <TASK>
>>>>     device_add+0x3ac/0x980
>>>>     nvdimm_bus_register+0x16d/0x1d0
>>>>     acpi_nfit_init+0xb72/0x1f90 [nfit]
>>>>     acpi_nfit_add+0x1d5/0x200 [nfit]
>>>>     acpi_device_probe+0x45/0x160
>>>
>>> Can you explain a bit more how to hit this crash? This has not been a
>>> problem historically and the explanation above makes it sound like this
>>> is a theoretical issue.
>>>
>>
>> Dan,
>>
>> Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter
>> 'initcall_blacklist=libnvdimm_init'. Then kernel panic!
>> Theoretically, it will also happen if nvdimm_bus_register() failed.
>>
>>
>> For kdump purpose[1], we need to disable libnvdimm driver to ensure metadata in pmem will not be updated again in kdump kernel
>> [1]https://lore.kernel.org/linux-mm/3c752fc2-b6a0-2975-ffec-dba3edcf4155@fujitsu.com/T/
> 
> Ah, great write up! Let me give that some thought. 

That would be great.


> Apologies for missing it earlier.

Never mind :)



> 
> This would have been a good use for:
> 
> Link: https://lore.kernel.org/linux-mm/3c752fc2-b6a0-2975-ffec-dba3edcf4155@fujitsu.com
> 
> ...in the above changelog.


sounds good, i will update it in next version.


Thanks
Zhijian
Dan Williams March 17, 2023, 5:59 a.m. UTC | #5
lizhijian@fujitsu.com wrote:
> 
> 
> On 16/03/2023 23:54, Dan Williams wrote:
> > Li Zhijian wrote:
> >> nvdimm_bus_register() could be called from other modules, such as nfit,
> >> but it can only be called after the nvdimm_bus_type is registered.
> >>
> >>   BUG: kernel NULL pointer dereference, address: 0000000000000098
> >>   #PF: supervisor read access in kernel mode
> >>   #PF: error_code(0x0000) - not-present page
> >>   PGD 0 P4D 0
> >>   Oops: 0000 [#1] PREEMPT SMP PTI
> >>   CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97
> >>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> >>   RIP: 0010:bus_add_device+0x58/0x150
> >>   Call Trace:
> >>    <TASK>
> >>    device_add+0x3ac/0x980
> >>    nvdimm_bus_register+0x16d/0x1d0
> >>    acpi_nfit_init+0xb72/0x1f90 [nfit]
> >>    acpi_nfit_add+0x1d5/0x200 [nfit]
> >>    acpi_device_probe+0x45/0x160
> > 
> > Can you explain a bit more how to hit this crash? This has not been a
> > problem historically and the explanation above makes it sound like this
> > is a theoretical issue.
> > 
> 
> Dan,
> 
> Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter
> 'initcall_blacklist=libnvdimm_init'. Then kernel panic!

That's expected though, you can't block libnvdimm_init and then expect
modules that link to libnvdimm to work. You would also need to block all
modules / initcalls that depend on libnvdimm_init having run. I'll
respond to the other thread with some ideas.
Zhijian Li (Fujitsu) March 20, 2023, 2:06 a.m. UTC | #6
On 17/03/2023 13:59, Dan Williams wrote:
> lizhijian@fujitsu.com wrote:
>>
>>
>> On 16/03/2023 23:54, Dan Williams wrote:
>>> Li Zhijian wrote:
>>>> nvdimm_bus_register() could be called from other modules, such as nfit,
>>>> but it can only be called after the nvdimm_bus_type is registered.
>>>>
>>>>    BUG: kernel NULL pointer dereference, address: 0000000000000098
>>>>    #PF: supervisor read access in kernel mode
>>>>    #PF: error_code(0x0000) - not-present page
>>>>    PGD 0 P4D 0
>>>>    Oops: 0000 [#1] PREEMPT SMP PTI
>>>>    CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97
>>>>    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
>>>>    RIP: 0010:bus_add_device+0x58/0x150
>>>>    Call Trace:
>>>>     <TASK>
>>>>     device_add+0x3ac/0x980
>>>>     nvdimm_bus_register+0x16d/0x1d0
>>>>     acpi_nfit_init+0xb72/0x1f90 [nfit]
>>>>     acpi_nfit_add+0x1d5/0x200 [nfit]
>>>>     acpi_device_probe+0x45/0x160
>>>
>>> Can you explain a bit more how to hit this crash? This has not been a
>>> problem historically and the explanation above makes it sound like this
>>> is a theoretical issue.
>>>
>>
>> Dan,
>>
>> Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter
>> 'initcall_blacklist=libnvdimm_init'. Then kernel panic!
> 
> That's expected though,

Do you mean we just keep it as it is.


> you can't block libnvdimm_init and then expect
> modules that link to libnvdimm to work.
Ah, we would rather see it *unable to work* than panic, isn't it.



Thanks
Zhijian


> You would also need to block all
> modules / initcalls that depend on libnvdimm_init having runI'll
> respond to the other thread with some ideas.
Dan Williams March 20, 2023, 5:30 p.m. UTC | #7
lizhijian@fujitsu.com wrote:
[..]
> >> Dan,
> >>
> >> Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter
> >> 'initcall_blacklist=libnvdimm_init'. Then kernel panic!
> > 
> > That's expected though,
> 
> Do you mean we just keep it as it is.

Yes.

> 
> 
> > you can't block libnvdimm_init and then expect
> > modules that link to libnvdimm to work.
> Ah, we would rather see it *unable to work* than panic, isn't it.

That part is true, but consider the implications of adding error
handling to all code that can no longer depend on initcall ordering, not
just libnvdimm. This would be a large paradigm shift.

Now I do think it would be a good idea to fail device_add() if the bus
is not registered, but I *think* that happens now as a result of:

5221b82d46f2 driver core: bus: bus_add/probe/remove_device() cleanups

...can you double check if you have that commit in your tests?
Zhijian Li (Fujitsu) March 21, 2023, 2:17 a.m. UTC | #8
On 21/03/2023 01:30, Dan Williams wrote:
> lizhijian@fujitsu.com wrote:
> [..]
>>>> Dan,
>>>>
>>>> Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter
>>>> 'initcall_blacklist=libnvdimm_init'. Then kernel panic!
>>>
>>> That's expected though,
>>
>> Do you mean we just keep it as it is.
> 
> Yes.
> 
>>
>>
>>> you can't block libnvdimm_init and then expect
>>> modules that link to libnvdimm to work.
>> Ah, we would rather see it *unable to work* than panic, isn't it.
> 
> That part is true, but consider the implications of adding error
> handling to all code that can no longer depend on initcall ordering, not
> just libnvdimm. This would be a large paradigm shift.
> 
> Now I do think it would be a good idea to fail device_add() if the bus
> is not registered, but I *think* that happens now as a result of:
> 
> 5221b82d46f2 driver core: bus: bus_add/probe/remove_device() cleanups
> 
> ...can you double check if you have that commit in your tests?

Great, panic is gone after i upgraded to the upstream!


> Now I do think it would be a good idea to fail device_add() if the bus
> is not registered,

BTW, below line 369: device_add() didn't fail in practical.

  354         mutex_init(&nvdimm_bus->reconfig_mutex);
  355         badrange_init(&nvdimm_bus->badrange);
  356         nvdimm_bus->nd_desc = nd_desc;
  357         nvdimm_bus->dev.parent = parent;
  358         nvdimm_bus->dev.type = &nvdimm_bus_dev_type;
  359         nvdimm_bus->dev.groups = nd_desc->attr_groups;
  360         nvdimm_bus->dev.bus = &nvdimm_bus_type;
  361         nvdimm_bus->dev.of_node = nd_desc->of_node;
  362         device_initialize(&nvdimm_bus->dev);
  363         lockdep_set_class(&nvdimm_bus->dev.mutex, &nvdimm_bus_key);
  364         device_set_pm_not_required(&nvdimm_bus->dev);
  365         rc = dev_set_name(&nvdimm_bus->dev, "ndbus%d", nvdimm_bus->id);
  366         if (rc)
  367                 goto err;
  368
  369         rc = device_add(&nvdimm_bus->dev);
  370         dev_err(&nvdimm_bus->dev, "registration failed: %d\n", rc);

Thanks
Zhijian
Dan Williams March 21, 2023, 2:38 a.m. UTC | #9
lizhijian@fujitsu.com wrote:
[..]
> > Now I do think it would be a good idea to fail device_add() if the bus
> > is not registered,
> 
> BTW, below line 369: device_add() didn't fail in practical.
> 

I think that's ok because the device gets added, but never probed due to
this part of that commit I referenced:

@@ -503,20 +517,21 @@ int bus_add_device(struct device *dev)
  */
 void bus_probe_device(struct device *dev)
 {
-       struct bus_type *bus = dev->bus;
+       struct subsys_private *sp = bus_to_subsys(dev->bus);
        struct subsys_interface *sif;
 
-       if (!bus)
+       if (!sp)
                return;
 

...so it does what you want which is disable the libnvdimm subsystem
from binding to any devices without crashing.
Zhijian Li (Fujitsu) March 21, 2023, 3:19 a.m. UTC | #10
On 21/03/2023 10:38, Dan Williams wrote:
> lizhijian@fujitsu.com wrote:
> [..]
>>> Now I do think it would be a good idea to fail device_add() if the bus
>>> is not registered,
>>
>> BTW, below line 369: device_add() didn't fail in practical.
>>
> 
> I think that's ok because the device gets added, but never probed due to
> this part of that commit I referenced:

Very thanks for your explanation.

> 
> @@ -503,20 +517,21 @@ int bus_add_device(struct device *dev)
>    */
>   void bus_probe_device(struct device *dev)
>   {
> -       struct bus_type *bus = dev->bus;
> +       struct subsys_private *sp = bus_to_subsys(dev->bus);
>          struct subsys_interface *sif;
>   
> -       if (!bus)
> +       if (!sp)
>                  return;
>   
> 
> ...so it does what you want which is disable the libnvdimm subsystem
> from binding to any devices without crashing.

Yes, it's fine enough to me.
diff mbox series

Patch

diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index ada61bbf49c1..ea66053072cb 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -28,6 +28,7 @@ 
 static int nvdimm_bus_major;
 struct class *nd_class;
 static DEFINE_IDA(nd_ida);
+static bool nvdimm_bus_type_registered;
 
 static int to_nd_device_type(struct device *dev)
 {
@@ -337,6 +338,10 @@  struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
 	struct nvdimm_bus *nvdimm_bus;
 	int rc;
 
+	if (!nvdimm_bus_type_registered) {
+		pr_warn("nvdimm bus type is not registered\n");
+		return NULL;
+	}
 	nvdimm_bus = kzalloc(sizeof(*nvdimm_bus), GFP_KERNEL);
 	if (!nvdimm_bus)
 		return NULL;
@@ -1321,6 +1326,7 @@  int __init nvdimm_bus_init(void)
 	if (rc)
 		goto err_nd_bus;
 
+	nvdimm_bus_type_registered = true;
 	return 0;
 
  err_nd_bus:
@@ -1343,4 +1349,5 @@  void nvdimm_bus_exit(void)
 	unregister_chrdev(nvdimm_major, "dimmctl");
 	bus_unregister(&nvdimm_bus_type);
 	ida_destroy(&nd_ida);
+	nvdimm_bus_type_registered = false;
 }