diff mbox series

[2/2] ACPI: APEI: Warn loudly on unsuccessful driver unbind

Message ID 20221013064459.121933-2-u.kleine-koenig@pengutronix.de (mailing list archive)
State Changes Requested, archived
Headers show
Series [1/2] ACPI: APEI: Drop unsetting driver data on remove | expand

Commit Message

Uwe Kleine-König Oct. 13, 2022, 6:44 a.m. UTC
If the remove callback failed, it leaves some unfreed resources behind
that will never be cleared. I didn't manage to understand the driver
good enough to judge how critical that really is.

This patch is part of an effort to change the remove callbacks for
platform devices to return void in the hope this will prevent the wrong
assumption that returning an error code from .remove() is proper error
handling.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
---
 drivers/acpi/apei/ghes.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

Hello,

on a side note: The remove callback calls (in some cases) free_irq() for
a shared interrupt. A requirement in this case is to disable the
device's interrupt beforehand. It's not obvious (to me that is) that
said irq is disabled here. This is another opportunity for ugly things
to happen.

Best regards
Uwe

Comments

Rafael J. Wysocki Oct. 28, 2022, 4:47 p.m. UTC | #1
On Thu, Oct 13, 2022 at 8:53 AM Uwe Kleine-König
<u.kleine-koenig@pengutronix.de> wrote:
>
> If the remove callback failed, it leaves some unfreed resources behind
> that will never be cleared. I didn't manage to understand the driver
> good enough to judge how critical that really is.
>
> This patch is part of an effort to change the remove callbacks for
> platform devices to return void in the hope this will prevent the wrong
> assumption that returning an error code from .remove() is proper error
> handling.
>
> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> ---
>  drivers/acpi/apei/ghes.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
>
> Hello,
>
> on a side note: The remove callback calls (in some cases) free_irq() for
> a shared interrupt. A requirement in this case is to disable the
> device's interrupt beforehand. It's not obvious (to me that is) that
> said irq is disabled here. This is another opportunity for ugly things
> to happen.
>
> Best regards
> Uwe
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 307fbb97a116..78d2e4df74ee 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -1393,7 +1393,7 @@ static int ghes_probe(struct platform_device *ghes_dev)
>         return rc;
>  }
>
> -static int ghes_remove(struct platform_device *ghes_dev)
> +static int _ghes_remove(struct platform_device *ghes_dev)
>  {
>         int rc;
>         struct ghes *ghes;
> @@ -1447,6 +1447,21 @@ static int ghes_remove(struct platform_device *ghes_dev)
>         return 0;
>  }
>
> +static int ghes_remove(struct platform_device *ghes_dev)
> +{
> +       /*
> +        * If _ghes_remove() returns an error, we're in trouble. Some of the
> +        * cleanup was skipped then and this will never be catched up. So some
> +        * resources will stay around, maybe even used although the platform
> +        * device will be gone.
> +        */
> +       int err = _ghes_remove(ghes_dev);
> +
> +       WARN_ON(err);
> +
> +       return 0;
> +}

No, no, no, we don't cut corners like this.

It is not very hard to observe that the only case in which
ghes_remove() can return an error is when it calls
apei_sdei_unregister_ghes().

So if you look at that one, you'll notice that it only propagates the
return value of sdei_unregister_ghes() (apart from the trivial "no
support" case).

Now, sdei_unregister_ghes() really does things that shouldn't fail
(because how can firmware refuse to disable an event on unregister
it), so that's where the warning should be emitted (in case they fail
nevertheless).

Also I don't think that dumping the stack is worth it, because the
code path is known.


> +
>  static struct platform_driver ghes_platform_driver = {
>         .driver         = {
>                 .name   = "GHES",
> --
diff mbox series

Patch

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 307fbb97a116..78d2e4df74ee 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -1393,7 +1393,7 @@  static int ghes_probe(struct platform_device *ghes_dev)
 	return rc;
 }
 
-static int ghes_remove(struct platform_device *ghes_dev)
+static int _ghes_remove(struct platform_device *ghes_dev)
 {
 	int rc;
 	struct ghes *ghes;
@@ -1447,6 +1447,21 @@  static int ghes_remove(struct platform_device *ghes_dev)
 	return 0;
 }
 
+static int ghes_remove(struct platform_device *ghes_dev)
+{
+	/*
+	 * If _ghes_remove() returns an error, we're in trouble. Some of the
+	 * cleanup was skipped then and this will never be catched up. So some
+	 * resources will stay around, maybe even used although the platform
+	 * device will be gone.
+	 */
+	int err = _ghes_remove(ghes_dev);
+
+	WARN_ON(err);
+
+	return 0;
+}
+
 static struct platform_driver ghes_platform_driver = {
 	.driver		= {
 		.name	= "GHES",