diff mbox series

platform/x86: thinkpad_acpi: Fix NULL pointer dereferences while probing

Message ID 20250330-thinkpad-fix-v1-1-4906b3fe6b74@gmail.com (mailing list archive)
State Accepted, archived
Headers show
Series platform/x86: thinkpad_acpi: Fix NULL pointer dereferences while probing | expand

Commit Message

Kurt Borja March 30, 2025, 3:39 p.m. UTC
Some subdrivers make use of the global reference tpacpi_pdev during
initialization, which is called from the platform driver's probe.
However, after

commit 38b9ab80db31 ("platform/x86: thinkpad_acpi: Move subdriver initialization to tpacpi_pdriver's probe.")

this variable is only properly initialized *after* probing and this can
result in a NULL pointer dereference.

In order to fix this without reverting the commit, register the platform
bundle in two steps, first create and initialize tpacpi_pdev, then
register the driver synchronously with platform_driver_probe(). This way
the benefits of commit 38b9ab80db31 are preserved.

Additionally,

commit 43fc63a1e8f6 ("platform/x86: thinkpad_acpi: Move HWMON initialization to tpacpi_hwmon_pdriver's probe")

introduced a similar problem, however tpacpi_sensors_pdev is only used
once inside the probe, so replace the global reference with the one
given by the probe.

Reported-by: Damian Tometzki <damian@riscv-rocks.de>
Closes: https://lore.kernel.org/r/CAL=B37kdL1orSQZD2A3skDOevRXBzF__cJJgY_GFh9LZO3FMsw@mail.gmail.com/
Fixes: 38b9ab80db31 ("platform/x86: thinkpad_acpi: Move subdriver initialization to tpacpi_pdriver's probe.")
Fixes: 43fc63a1e8f6 ("platform/x86: thinkpad_acpi: Move HWMON initialization to tpacpi_hwmon_pdriver's probe")
Tested-by: Damian Tometzki <damian@riscv-rocks.de>
Signed-off-by: Kurt Borja <kuurtb@gmail.com>
---
Hi all,

The commit message is pretty self-explanatory. I have one question
though. As you can see in the crash dump of the original report:

Mar 29 17:43:16.180758 fedora kernel:  ? asm_exc_page_fault+0x26/0x30
Mar 29 17:43:16.180769 fedora kernel:  ? __pfx_klist_children_get+0x10/0x10
Mar 29 17:43:16.180781 fedora kernel:  ? kobject_get+0xd/0x70
Mar 29 17:43:16.180792 fedora kernel:  device_add+0x8f/0x6e0
Mar 29 17:43:16.180804 fedora kernel:  rfkill_register+0xbc/0x2c0 [rfkill]
Mar 29 17:43:16.180813 fedora kernel:  tpacpi_new_rfkill+0x185/0x230 [thinkpad_acpi]

The NULL dereference happens in device_add(), inside rfkill_register().
This bothers me because, as you can see here:

 1198                 atp_rfk->rfkill = rfkill_alloc(name,
 1199                                                 &tpacpi_pdev->dev,
 1200                                                 rfktype,
 1201                                                 &tpacpi_rfk_rfkill_ops,
 1202                                                 atp_rfk);

the NULL deference happens in line 1199, inside tpacpi_new_rfkill(). I
think this disagreement might be due to compile time optimizations?

Well, if someone knows better, let me know!

(This driver is going to give me nightmares, sorry for the bug!)
---
 drivers/platform/x86/thinkpad_acpi.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)


---
base-commit: 1a9239bb4253f9076b5b4b2a1a4e8d7defd77a95
change-id: 20250330-thinkpad-fix-98db0d8c3be3

Best regards,

Comments

Kurt Borja March 30, 2025, 6:43 p.m. UTC | #1
On Sun Mar 30, 2025 at 12:39 PM -03, Kurt Borja wrote:
> Some subdrivers make use of the global reference tpacpi_pdev during
> initialization, which is called from the platform driver's probe.
> However, after
>
> commit 38b9ab80db31 ("platform/x86: thinkpad_acpi: Move subdriver initialization to tpacpi_pdriver's probe.")
>
> this variable is only properly initialized *after* probing and this can
> result in a NULL pointer dereference.
>
> In order to fix this without reverting the commit, register the platform
> bundle in two steps, first create and initialize tpacpi_pdev, then
> register the driver synchronously with platform_driver_probe(). This way
> the benefits of commit 38b9ab80db31 are preserved.
>
> Additionally,
>
> commit 43fc63a1e8f6 ("platform/x86: thinkpad_acpi: Move HWMON initialization to tpacpi_hwmon_pdriver's probe")
>
> introduced a similar problem, however tpacpi_sensors_pdev is only used
> once inside the probe, so replace the global reference with the one
> given by the probe.

I don't understand why b4 added the linux-riscv list to the recipients,
but it was definitely not inteded.

Sorry for the noise.
Genes Lists March 31, 2025, 5:26 p.m. UTC | #2
On Sun, 2025-03-30 at 12:39 -0300, Kurt Borja wrote:
> Some subdrivers make use of the global reference tpacpi_pdev during
> initialization, which is called from the platform driver's probe.
> However, after
> 
> commit 38b9ab80db31 ("platform/x86: thinkpad_acpi: Move subdriver
> initialization to tpacpi_pdriver's probe.")
> 
> this variable is only properly initialized *after* probing and this
> can
> result in a NULL pointer dereference.
> 
> In order to fix this without reverting the commit, register the
> platform
> bundle in two steps, first create and initialize tpacpi_pdev, then
> register the driver synchronously with platform_driver_probe(). This
> way
> the benefits of commit 38b9ab80db31 are preserved.
> 
> Additionally,
> 
> commit 43fc63a1e8f6 ("platform/x86: thinkpad_acpi: Move HWMON
> initialization to tpacpi_hwmon_pdriver's probe")
> 
> introduced a similar problem, however tpacpi_sensors_pdev is only
> used
> once inside the probe, so replace the global reference with the one
> given by the probe.
> 
> ...
> base-commit: 1a9239bb4253f9076b5b4b2a1a4e8d7defd77a95
> change-id: 20250330-thinkpad-fix-98db0d8c3be3
> 
Fixed problem seen here on thinkpad.
Tested on mainline commit 4e82c87058f45e79eeaa4d5bcc3b38dd3dce7209

Tested-by: Gene C <arch@sapience.com>
Ilpo Järvinen April 1, 2025, 11:24 a.m. UTC | #3
On Sun, 30 Mar 2025, Kurt Borja wrote:

> Some subdrivers make use of the global reference tpacpi_pdev during
> initialization, which is called from the platform driver's probe.
> However, after
> 
> commit 38b9ab80db31 ("platform/x86: thinkpad_acpi: Move subdriver initialization to tpacpi_pdriver's probe.")
> 

Next time, please include these into the paragraph flow normally obeying 
the normal paragraph formatting. I changed them in this case.

> this variable is only properly initialized *after* probing and this can
> result in a NULL pointer dereference.
> 
> In order to fix this without reverting the commit, register the platform
> bundle in two steps, first create and initialize tpacpi_pdev, then
> register the driver synchronously with platform_driver_probe(). This way
> the benefits of commit 38b9ab80db31 are preserved.
> 
> Additionally,
> 
> commit 43fc63a1e8f6 ("platform/x86: thinkpad_acpi: Move HWMON initialization to tpacpi_hwmon_pdriver's probe")
> 
> introduced a similar problem, however tpacpi_sensors_pdev is only used
> once inside the probe, so replace the global reference with the one
> given by the probe.
> 
> Reported-by: Damian Tometzki <damian@riscv-rocks.de>
> Closes: https://lore.kernel.org/r/CAL=B37kdL1orSQZD2A3skDOevRXBzF__cJJgY_GFh9LZO3FMsw@mail.gmail.com/
> Fixes: 38b9ab80db31 ("platform/x86: thinkpad_acpi: Move subdriver initialization to tpacpi_pdriver's probe.")
> Fixes: 43fc63a1e8f6 ("platform/x86: thinkpad_acpi: Move HWMON initialization to tpacpi_hwmon_pdriver's probe")
> Tested-by: Damian Tometzki <damian@riscv-rocks.de>
> Signed-off-by: Kurt Borja <kuurtb@gmail.com>

Applied to the review-ilpo-fixes branch.

> ---
> Hi all,
> 
> The commit message is pretty self-explanatory. I have one question
> though. As you can see in the crash dump of the original report:
> 
> Mar 29 17:43:16.180758 fedora kernel:  ? asm_exc_page_fault+0x26/0x30
> Mar 29 17:43:16.180769 fedora kernel:  ? __pfx_klist_children_get+0x10/0x10
> Mar 29 17:43:16.180781 fedora kernel:  ? kobject_get+0xd/0x70
> Mar 29 17:43:16.180792 fedora kernel:  device_add+0x8f/0x6e0
> Mar 29 17:43:16.180804 fedora kernel:  rfkill_register+0xbc/0x2c0 [rfkill]
> Mar 29 17:43:16.180813 fedora kernel:  tpacpi_new_rfkill+0x185/0x230 [thinkpad_acpi]
> 
> The NULL dereference happens in device_add(), inside rfkill_register().
> This bothers me because, as you can see here:
> 
>  1198                 atp_rfk->rfkill = rfkill_alloc(name,
>  1199                                                 &tpacpi_pdev->dev,
>  1200                                                 rfktype,
>  1201                                                 &tpacpi_rfk_rfkill_ops,
>  1202                                                 atp_rfk);
> 
> the NULL deference happens in line 1199, inside tpacpi_new_rfkill(). I
> think this disagreement might be due to compile time optimizations?

How did you map it to line numbers? Is it just about difference in the 
compiled binaries that results in different line numbers?
Kurt Borja April 1, 2025, 2:43 p.m. UTC | #4
Hi Ilpo,

On Tue Apr 1, 2025 at 8:24 AM -03, Ilpo Järvinen wrote:
> On Sun, 30 Mar 2025, Kurt Borja wrote:
>
>> Some subdrivers make use of the global reference tpacpi_pdev during
>> initialization, which is called from the platform driver's probe.
>> However, after
>> 
>> commit 38b9ab80db31 ("platform/x86: thinkpad_acpi: Move subdriver initialization to tpacpi_pdriver's probe.")
>> 
>
> Next time, please include these into the paragraph flow normally obeying 
> the normal paragraph formatting. I changed them in this case.

Thanks, won't happen next time.

>
>> this variable is only properly initialized *after* probing and this can
>> result in a NULL pointer dereference.
>> 
>> In order to fix this without reverting the commit, register the platform
>> bundle in two steps, first create and initialize tpacpi_pdev, then
>> register the driver synchronously with platform_driver_probe(). This way
>> the benefits of commit 38b9ab80db31 are preserved.
>> 
>> Additionally,
>> 
>> commit 43fc63a1e8f6 ("platform/x86: thinkpad_acpi: Move HWMON initialization to tpacpi_hwmon_pdriver's probe")
>> 
>> introduced a similar problem, however tpacpi_sensors_pdev is only used
>> once inside the probe, so replace the global reference with the one
>> given by the probe.
>> 
>> Reported-by: Damian Tometzki <damian@riscv-rocks.de>
>> Closes: https://lore.kernel.org/r/CAL=B37kdL1orSQZD2A3skDOevRXBzF__cJJgY_GFh9LZO3FMsw@mail.gmail.com/
>> Fixes: 38b9ab80db31 ("platform/x86: thinkpad_acpi: Move subdriver initialization to tpacpi_pdriver's probe.")
>> Fixes: 43fc63a1e8f6 ("platform/x86: thinkpad_acpi: Move HWMON initialization to tpacpi_hwmon_pdriver's probe")
>> Tested-by: Damian Tometzki <damian@riscv-rocks.de>
>> Signed-off-by: Kurt Borja <kuurtb@gmail.com>
>
> Applied to the review-ilpo-fixes branch.

Thank you!

>
>> ---
>> Hi all,
>> 
>> The commit message is pretty self-explanatory. I have one question
>> though. As you can see in the crash dump of the original report:
>> 
>> Mar 29 17:43:16.180758 fedora kernel:  ? asm_exc_page_fault+0x26/0x30
>> Mar 29 17:43:16.180769 fedora kernel:  ? __pfx_klist_children_get+0x10/0x10
>> Mar 29 17:43:16.180781 fedora kernel:  ? kobject_get+0xd/0x70
>> Mar 29 17:43:16.180792 fedora kernel:  device_add+0x8f/0x6e0
>> Mar 29 17:43:16.180804 fedora kernel:  rfkill_register+0xbc/0x2c0 [rfkill]
>> Mar 29 17:43:16.180813 fedora kernel:  tpacpi_new_rfkill+0x185/0x230 [thinkpad_acpi]
>> 
>> The NULL dereference happens in device_add(), inside rfkill_register().
>> This bothers me because, as you can see here:
>> 
>>  1198                 atp_rfk->rfkill = rfkill_alloc(name,
>>  1199                                                 &tpacpi_pdev->dev,
>>  1200                                                 rfktype,
>>  1201                                                 &tpacpi_rfk_rfkill_ops,
>>  1202                                                 atp_rfk);
>> 
>> the NULL deference happens in line 1199, inside tpacpi_new_rfkill(). I
>> think this disagreement might be due to compile time optimizations?
>
> How did you map it to line numbers? Is it just about difference in the 
> compiled binaries that results in different line numbers?

Oh - I just manually followed the dump trace in search of the first
instance of a NULL derefence. If I understand correctly, inside
thinkpad_acpi we do reach rfkill_register(), which is line

 1227         res = rfkill_register(atp_rfk->rfkill);

and I imagine the RIP happens when device_add() tries to get a reference
to the parent of the allocated rfkill device. But it's weird because we
shouldn't even reach 1227, as the NULL deref first happens at 1199.

NULL deref is UB so I guess it makes sense?

BTW I got all these line numbers using the base commit.
diff mbox series

Patch

diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c
index 0384cf31187872df90f5ac3def9b1d6617e82ed5..a17efb68664c9c7723daa2aba023ba0cbc6b96dd 100644
--- a/drivers/platform/x86/thinkpad_acpi.c
+++ b/drivers/platform/x86/thinkpad_acpi.c
@@ -367,6 +367,7 @@  static struct {
 	u32 beep_needs_two_args:1;
 	u32 mixer_no_level_control:1;
 	u32 battery_force_primary:1;
+	u32 platform_drv_registered:1;
 	u32 hotkey_poll_active:1;
 	u32 has_adaptive_kbd:1;
 	u32 kbd_lang:1;
@@ -11820,10 +11821,10 @@  static void thinkpad_acpi_module_exit(void)
 		platform_device_unregister(tpacpi_sensors_pdev);
 	}
 
-	if (tpacpi_pdev) {
+	if (tp_features.platform_drv_registered)
 		platform_driver_unregister(&tpacpi_pdriver);
+	if (tpacpi_pdev)
 		platform_device_unregister(tpacpi_pdev);
-	}
 
 	if (proc_dir)
 		remove_proc_entry(TPACPI_PROC_DIR, acpi_root_dir);
@@ -11893,9 +11894,8 @@  static int __init tpacpi_pdriver_probe(struct platform_device *pdev)
 
 static int __init tpacpi_hwmon_pdriver_probe(struct platform_device *pdev)
 {
-	tpacpi_hwmon = devm_hwmon_device_register_with_groups(
-		&tpacpi_sensors_pdev->dev, TPACPI_NAME, NULL, tpacpi_hwmon_groups);
-
+	tpacpi_hwmon = devm_hwmon_device_register_with_groups(&pdev->dev, TPACPI_NAME,
+							      NULL, tpacpi_hwmon_groups);
 	if (IS_ERR(tpacpi_hwmon))
 		pr_err("unable to register hwmon device\n");
 
@@ -11965,16 +11965,24 @@  static int __init thinkpad_acpi_module_init(void)
 		tp_features.quirks = dmi_id->driver_data;
 
 	/* Device initialization */
-	tpacpi_pdev = platform_create_bundle(&tpacpi_pdriver, tpacpi_pdriver_probe,
-					     NULL, 0, NULL, 0);
+	tpacpi_pdev = platform_device_register_simple(TPACPI_DRVR_NAME, PLATFORM_DEVID_NONE,
+						      NULL, 0);
 	if (IS_ERR(tpacpi_pdev)) {
 		ret = PTR_ERR(tpacpi_pdev);
 		tpacpi_pdev = NULL;
-		pr_err("unable to register platform device/driver bundle\n");
+		pr_err("unable to register platform device\n");
 		thinkpad_acpi_module_exit();
 		return ret;
 	}
 
+	ret = platform_driver_probe(&tpacpi_pdriver, tpacpi_pdriver_probe);
+	if (ret) {
+		pr_err("unable to register main platform driver\n");
+		thinkpad_acpi_module_exit();
+		return ret;
+	}
+	tp_features.platform_drv_registered = 1;
+
 	tpacpi_sensors_pdev = platform_create_bundle(&tpacpi_hwmon_pdriver,
 						     tpacpi_hwmon_pdriver_probe,
 						     NULL, 0, NULL, 0);