Message ID | 1678955931-2-1-git-send-email-lizhijian@fujitsu.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | nvdimm: nvdimm_bus_register: Avoid adding device to the unregistered bus | expand |
Li Zhijian wrote: > nvdimm_bus_register() could be called from other modules, such as nfit, > but it can only be called after the nvdimm_bus_type is registered. > > BUG: kernel NULL pointer dereference, address: 0000000000000098 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 0 P4D 0 > Oops: 0000 [#1] PREEMPT SMP PTI > CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 > RIP: 0010:bus_add_device+0x58/0x150 > Call Trace: > <TASK> > device_add+0x3ac/0x980 > nvdimm_bus_register+0x16d/0x1d0 > acpi_nfit_init+0xb72/0x1f90 [nfit] > acpi_nfit_add+0x1d5/0x200 [nfit] > acpi_device_probe+0x45/0x160 Can you explain a bit more how to hit this crash? This has not been a problem historically and the explanation above makes it sound like this is a theoretical issue. libnvdimm_init() *should* be done before the nfit driver can attempt nvdimm_bus_register(). So, something else is broken if nvdimm_bus_register() can be called before libnvdimm_init(), or after libnvdimm_exit().
On 16/03/2023 23:54, Dan Williams wrote: > Li Zhijian wrote: >> nvdimm_bus_register() could be called from other modules, such as nfit, >> but it can only be called after the nvdimm_bus_type is registered. >> >> BUG: kernel NULL pointer dereference, address: 0000000000000098 >> #PF: supervisor read access in kernel mode >> #PF: error_code(0x0000) - not-present page >> PGD 0 P4D 0 >> Oops: 0000 [#1] PREEMPT SMP PTI >> CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 >> RIP: 0010:bus_add_device+0x58/0x150 >> Call Trace: >> <TASK> >> device_add+0x3ac/0x980 >> nvdimm_bus_register+0x16d/0x1d0 >> acpi_nfit_init+0xb72/0x1f90 [nfit] >> acpi_nfit_add+0x1d5/0x200 [nfit] >> acpi_device_probe+0x45/0x160 > > Can you explain a bit more how to hit this crash? This has not been a > problem historically and the explanation above makes it sound like this > is a theoretical issue. > Dan, Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter 'initcall_blacklist=libnvdimm_init'. Then kernel panic! Theoretically, it will also happen if nvdimm_bus_register() failed. For kdump purpose[1], we need to disable libnvdimm driver to ensure metadata in pmem will not be updated again in kdump kernel [1]https://lore.kernel.org/linux-mm/3c752fc2-b6a0-2975-ffec-dba3edcf4155@fujitsu.com/T/ Thanks Zhijian > libnvdimm_init() *should* be done before the nfit driver can attempt > nvdimm_bus_register(). So, something else is broken if > nvdimm_bus_register() can be called before libnvdimm_init(), or after > libnvdimm_exit().
lizhijian@fujitsu.com wrote: > > > On 16/03/2023 23:54, Dan Williams wrote: > > Li Zhijian wrote: > >> nvdimm_bus_register() could be called from other modules, such as nfit, > >> but it can only be called after the nvdimm_bus_type is registered. > >> > >> BUG: kernel NULL pointer dereference, address: 0000000000000098 > >> #PF: supervisor read access in kernel mode > >> #PF: error_code(0x0000) - not-present page > >> PGD 0 P4D 0 > >> Oops: 0000 [#1] PREEMPT SMP PTI > >> CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97 > >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 > >> RIP: 0010:bus_add_device+0x58/0x150 > >> Call Trace: > >> <TASK> > >> device_add+0x3ac/0x980 > >> nvdimm_bus_register+0x16d/0x1d0 > >> acpi_nfit_init+0xb72/0x1f90 [nfit] > >> acpi_nfit_add+0x1d5/0x200 [nfit] > >> acpi_device_probe+0x45/0x160 > > > > Can you explain a bit more how to hit this crash? This has not been a > > problem historically and the explanation above makes it sound like this > > is a theoretical issue. > > > > Dan, > > Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter > 'initcall_blacklist=libnvdimm_init'. Then kernel panic! > Theoretically, it will also happen if nvdimm_bus_register() failed. > > > For kdump purpose[1], we need to disable libnvdimm driver to ensure metadata in pmem will not be updated again in kdump kernel > [1]https://lore.kernel.org/linux-mm/3c752fc2-b6a0-2975-ffec-dba3edcf4155@fujitsu.com/T/ Ah, great write up! Let me give that some thought. Apologies for missing it earlier. This would have been a good use for: Link: https://lore.kernel.org/linux-mm/3c752fc2-b6a0-2975-ffec-dba3edcf4155@fujitsu.com ...in the above changelog.
On 17/03/2023 10:26, Dan Williams wrote: > lizhijian@fujitsu.com wrote: >> >> >> On 16/03/2023 23:54, Dan Williams wrote: >>> Li Zhijian wrote: >>>> nvdimm_bus_register() could be called from other modules, such as nfit, >>>> but it can only be called after the nvdimm_bus_type is registered. >>>> >>>> BUG: kernel NULL pointer dereference, address: 0000000000000098 >>>> #PF: supervisor read access in kernel mode >>>> #PF: error_code(0x0000) - not-present page >>>> PGD 0 P4D 0 >>>> Oops: 0000 [#1] PREEMPT SMP PTI >>>> CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97 >>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 >>>> RIP: 0010:bus_add_device+0x58/0x150 >>>> Call Trace: >>>> <TASK> >>>> device_add+0x3ac/0x980 >>>> nvdimm_bus_register+0x16d/0x1d0 >>>> acpi_nfit_init+0xb72/0x1f90 [nfit] >>>> acpi_nfit_add+0x1d5/0x200 [nfit] >>>> acpi_device_probe+0x45/0x160 >>> >>> Can you explain a bit more how to hit this crash? This has not been a >>> problem historically and the explanation above makes it sound like this >>> is a theoretical issue. >>> >> >> Dan, >> >> Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter >> 'initcall_blacklist=libnvdimm_init'. Then kernel panic! >> Theoretically, it will also happen if nvdimm_bus_register() failed. >> >> >> For kdump purpose[1], we need to disable libnvdimm driver to ensure metadata in pmem will not be updated again in kdump kernel >> [1]https://lore.kernel.org/linux-mm/3c752fc2-b6a0-2975-ffec-dba3edcf4155@fujitsu.com/T/ > > Ah, great write up! Let me give that some thought. That would be great. > Apologies for missing it earlier. Never mind :) > > This would have been a good use for: > > Link: https://lore.kernel.org/linux-mm/3c752fc2-b6a0-2975-ffec-dba3edcf4155@fujitsu.com > > ...in the above changelog. sounds good, i will update it in next version. Thanks Zhijian
lizhijian@fujitsu.com wrote: > > > On 16/03/2023 23:54, Dan Williams wrote: > > Li Zhijian wrote: > >> nvdimm_bus_register() could be called from other modules, such as nfit, > >> but it can only be called after the nvdimm_bus_type is registered. > >> > >> BUG: kernel NULL pointer dereference, address: 0000000000000098 > >> #PF: supervisor read access in kernel mode > >> #PF: error_code(0x0000) - not-present page > >> PGD 0 P4D 0 > >> Oops: 0000 [#1] PREEMPT SMP PTI > >> CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97 > >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 > >> RIP: 0010:bus_add_device+0x58/0x150 > >> Call Trace: > >> <TASK> > >> device_add+0x3ac/0x980 > >> nvdimm_bus_register+0x16d/0x1d0 > >> acpi_nfit_init+0xb72/0x1f90 [nfit] > >> acpi_nfit_add+0x1d5/0x200 [nfit] > >> acpi_device_probe+0x45/0x160 > > > > Can you explain a bit more how to hit this crash? This has not been a > > problem historically and the explanation above makes it sound like this > > is a theoretical issue. > > > > Dan, > > Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter > 'initcall_blacklist=libnvdimm_init'. Then kernel panic! That's expected though, you can't block libnvdimm_init and then expect modules that link to libnvdimm to work. You would also need to block all modules / initcalls that depend on libnvdimm_init having run. I'll respond to the other thread with some ideas.
On 17/03/2023 13:59, Dan Williams wrote: > lizhijian@fujitsu.com wrote: >> >> >> On 16/03/2023 23:54, Dan Williams wrote: >>> Li Zhijian wrote: >>>> nvdimm_bus_register() could be called from other modules, such as nfit, >>>> but it can only be called after the nvdimm_bus_type is registered. >>>> >>>> BUG: kernel NULL pointer dereference, address: 0000000000000098 >>>> #PF: supervisor read access in kernel mode >>>> #PF: error_code(0x0000) - not-present page >>>> PGD 0 P4D 0 >>>> Oops: 0000 [#1] PREEMPT SMP PTI >>>> CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97 >>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 >>>> RIP: 0010:bus_add_device+0x58/0x150 >>>> Call Trace: >>>> <TASK> >>>> device_add+0x3ac/0x980 >>>> nvdimm_bus_register+0x16d/0x1d0 >>>> acpi_nfit_init+0xb72/0x1f90 [nfit] >>>> acpi_nfit_add+0x1d5/0x200 [nfit] >>>> acpi_device_probe+0x45/0x160 >>> >>> Can you explain a bit more how to hit this crash? This has not been a >>> problem historically and the explanation above makes it sound like this >>> is a theoretical issue. >>> >> >> Dan, >> >> Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter >> 'initcall_blacklist=libnvdimm_init'. Then kernel panic! > > That's expected though, Do you mean we just keep it as it is. > you can't block libnvdimm_init and then expect > modules that link to libnvdimm to work. Ah, we would rather see it *unable to work* than panic, isn't it. Thanks Zhijian > You would also need to block all > modules / initcalls that depend on libnvdimm_init having runI'll > respond to the other thread with some ideas.
lizhijian@fujitsu.com wrote: [..] > >> Dan, > >> > >> Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter > >> 'initcall_blacklist=libnvdimm_init'. Then kernel panic! > > > > That's expected though, > > Do you mean we just keep it as it is. Yes. > > > > you can't block libnvdimm_init and then expect > > modules that link to libnvdimm to work. > Ah, we would rather see it *unable to work* than panic, isn't it. That part is true, but consider the implications of adding error handling to all code that can no longer depend on initcall ordering, not just libnvdimm. This would be a large paradigm shift. Now I do think it would be a good idea to fail device_add() if the bus is not registered, but I *think* that happens now as a result of: 5221b82d46f2 driver core: bus: bus_add/probe/remove_device() cleanups ...can you double check if you have that commit in your tests?
On 21/03/2023 01:30, Dan Williams wrote: > lizhijian@fujitsu.com wrote: > [..] >>>> Dan, >>>> >>>> Configure the kconfig with ACPI_NFIT [=m] && LIBNVDIMM [=y], and add extra kernel booting parameter >>>> 'initcall_blacklist=libnvdimm_init'. Then kernel panic! >>> >>> That's expected though, >> >> Do you mean we just keep it as it is. > > Yes. > >> >> >>> you can't block libnvdimm_init and then expect >>> modules that link to libnvdimm to work. >> Ah, we would rather see it *unable to work* than panic, isn't it. > > That part is true, but consider the implications of adding error > handling to all code that can no longer depend on initcall ordering, not > just libnvdimm. This would be a large paradigm shift. > > Now I do think it would be a good idea to fail device_add() if the bus > is not registered, but I *think* that happens now as a result of: > > 5221b82d46f2 driver core: bus: bus_add/probe/remove_device() cleanups > > ...can you double check if you have that commit in your tests? Great, panic is gone after i upgraded to the upstream! > Now I do think it would be a good idea to fail device_add() if the bus > is not registered, BTW, below line 369: device_add() didn't fail in practical. 354 mutex_init(&nvdimm_bus->reconfig_mutex); 355 badrange_init(&nvdimm_bus->badrange); 356 nvdimm_bus->nd_desc = nd_desc; 357 nvdimm_bus->dev.parent = parent; 358 nvdimm_bus->dev.type = &nvdimm_bus_dev_type; 359 nvdimm_bus->dev.groups = nd_desc->attr_groups; 360 nvdimm_bus->dev.bus = &nvdimm_bus_type; 361 nvdimm_bus->dev.of_node = nd_desc->of_node; 362 device_initialize(&nvdimm_bus->dev); 363 lockdep_set_class(&nvdimm_bus->dev.mutex, &nvdimm_bus_key); 364 device_set_pm_not_required(&nvdimm_bus->dev); 365 rc = dev_set_name(&nvdimm_bus->dev, "ndbus%d", nvdimm_bus->id); 366 if (rc) 367 goto err; 368 369 rc = device_add(&nvdimm_bus->dev); 370 dev_err(&nvdimm_bus->dev, "registration failed: %d\n", rc); Thanks Zhijian
lizhijian@fujitsu.com wrote: [..] > > Now I do think it would be a good idea to fail device_add() if the bus > > is not registered, > > BTW, below line 369: device_add() didn't fail in practical. > I think that's ok because the device gets added, but never probed due to this part of that commit I referenced: @@ -503,20 +517,21 @@ int bus_add_device(struct device *dev) */ void bus_probe_device(struct device *dev) { - struct bus_type *bus = dev->bus; + struct subsys_private *sp = bus_to_subsys(dev->bus); struct subsys_interface *sif; - if (!bus) + if (!sp) return; ...so it does what you want which is disable the libnvdimm subsystem from binding to any devices without crashing.
On 21/03/2023 10:38, Dan Williams wrote: > lizhijian@fujitsu.com wrote: > [..] >>> Now I do think it would be a good idea to fail device_add() if the bus >>> is not registered, >> >> BTW, below line 369: device_add() didn't fail in practical. >> > > I think that's ok because the device gets added, but never probed due to > this part of that commit I referenced: Very thanks for your explanation. > > @@ -503,20 +517,21 @@ int bus_add_device(struct device *dev) > */ > void bus_probe_device(struct device *dev) > { > - struct bus_type *bus = dev->bus; > + struct subsys_private *sp = bus_to_subsys(dev->bus); > struct subsys_interface *sif; > > - if (!bus) > + if (!sp) > return; > > > ...so it does what you want which is disable the libnvdimm subsystem > from binding to any devices without crashing. Yes, it's fine enough to me.
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c index ada61bbf49c1..ea66053072cb 100644 --- a/drivers/nvdimm/bus.c +++ b/drivers/nvdimm/bus.c @@ -28,6 +28,7 @@ static int nvdimm_bus_major; struct class *nd_class; static DEFINE_IDA(nd_ida); +static bool nvdimm_bus_type_registered; static int to_nd_device_type(struct device *dev) { @@ -337,6 +338,10 @@ struct nvdimm_bus *nvdimm_bus_register(struct device *parent, struct nvdimm_bus *nvdimm_bus; int rc; + if (!nvdimm_bus_type_registered) { + pr_warn("nvdimm bus type is not registered\n"); + return NULL; + } nvdimm_bus = kzalloc(sizeof(*nvdimm_bus), GFP_KERNEL); if (!nvdimm_bus) return NULL; @@ -1321,6 +1326,7 @@ int __init nvdimm_bus_init(void) if (rc) goto err_nd_bus; + nvdimm_bus_type_registered = true; return 0; err_nd_bus: @@ -1343,4 +1349,5 @@ void nvdimm_bus_exit(void) unregister_chrdev(nvdimm_major, "dimmctl"); bus_unregister(&nvdimm_bus_type); ida_destroy(&nd_ida); + nvdimm_bus_type_registered = false; }
nvdimm_bus_register() could be called from other modules, such as nfit, but it can only be called after the nvdimm_bus_type is registered. BUG: kernel NULL pointer dereference, address: 0000000000000098 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 6.2.0-rc6-pmem+ #97 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:bus_add_device+0x58/0x150 Call Trace: <TASK> device_add+0x3ac/0x980 nvdimm_bus_register+0x16d/0x1d0 acpi_nfit_init+0xb72/0x1f90 [nfit] acpi_nfit_add+0x1d5/0x200 [nfit] acpi_device_probe+0x45/0x160 really_probe+0xce/0x390 __driver_probe_device+0x78/0x180 driver_probe_device+0x1e/0x90 __driver_attach+0xd6/0x1d0 bus_for_each_dev+0x7b/0xc0 bus_add_driver+0x1ac/0x200 driver_register+0x8f/0xf0 nfit_init+0x164/0xff0 [nfit] do_one_initcall+0x5b/0x320 do_init_module+0x4c/0x1f0 __do_sys_finit_module+0xb4/0x130 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> --- drivers/nvdimm/bus.c | 7 +++++++ 1 file changed, 7 insertions(+)