diff mbox series

[net-next,01/10] net: ipa: Don't error out in .remove()

Message ID 20231117095922.876489-2-u.kleine-koenig@pengutronix.de (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series net*: Convert to platform remove callback returning void | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1127 this patch: 1127
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 1154 this patch: 1154
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1154 this patch: 1154
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Uwe Kleine-König Nov. 17, 2023, 9:59 a.m. UTC
Returning early from .remove() with an error code still results in the
driver unbinding the device. So the driver core ignores the returned error
code and the resources that were not freed are never catched up. In
combination with devm this also often results in use-after-free bugs.

Here even if the modem cannot be stopped, resources must be freed. So
replace the early error return by an error message an continue to clean up.

This prepares changing ipa_remove() to return void.

Fixes: cdf2e9419dd9 ("soc: qcom: ipa: main code")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
---
 drivers/net/ipa/ipa_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Alex Elder Nov. 17, 2023, 2:16 p.m. UTC | #1
On 11/17/23 3:59 AM, Uwe Kleine-König wrote:
> Returning early from .remove() with an error code still results in the
> driver unbinding the device. So the driver core ignores the returned error
> code and the resources that were not freed are never catched up. In
> combination with devm this also often results in use-after-free bugs.
> 
> Here even if the modem cannot be stopped, resources must be freed. So
> replace the early error return by an error message an continue to clean up.
> 
> This prepares changing ipa_remove() to return void.
> 
> Fixes: cdf2e9419dd9 ("soc: qcom: ipa: main code")

Is this really a bug fix?  This code was doing the right
thing even if the caller was not.

> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> ---
>   drivers/net/ipa/ipa_main.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c
> index da853353a5c7..60e4f590f5de 100644
> --- a/drivers/net/ipa/ipa_main.c
> +++ b/drivers/net/ipa/ipa_main.c
> @@ -960,7 +960,8 @@ static int ipa_remove(struct platform_device *pdev)
>   			ret = ipa_modem_stop(ipa);
>   		}
>   		if (ret)
> -			return ret;
> +			dev_err(dev, "Failed to stop modem (%pe)\n",
> +				ERR_PTR(ret));

I think this is not correct, or rather, I think it is less
correct than returning early.

What's happening here is we're trying to stop the modem.
It is an external entity that might have some in-flight
activity that could include "owning" some buffers provided
by Linux, to be filled with received data.  There's a
chance that cleaning up (with the call to ipa_teardown())
can do the right thing, but I'm not going to sign off on
this until I've looked at that in closer detail.

This is something that *could* happen but is not *expected*
to happen.  We expect stopping the modem to succeed so if
it doesn't, something's wrong and it's not 100% clear how
to properly handle it.

For now...  you know a little more about my hesitation, but
please wait to commit this change until I've had a chance
to spend more time reviewing.

					-Alex

>   
>   		ipa_teardown(ipa);
>   	}
Uwe Kleine-König Nov. 17, 2023, 2:45 p.m. UTC | #2
Hello Alex,

On Fri, Nov 17, 2023 at 08:16:02AM -0600, Alex Elder wrote:
> On 11/17/23 3:59 AM, Uwe Kleine-König wrote:
> > Returning early from .remove() with an error code still results in the
> > driver unbinding the device. So the driver core ignores the returned error
> > code and the resources that were not freed are never catched up. In
> > combination with devm this also often results in use-after-free bugs.
> > 
> > Here even if the modem cannot be stopped, resources must be freed. So
> > replace the early error return by an error message an continue to clean up.
> > 
> > This prepares changing ipa_remove() to return void.
> > 
> > Fixes: cdf2e9419dd9 ("soc: qcom: ipa: main code")
> 
> Is this really a bug fix?  This code was doing the right
> thing even if the caller was not.

Yes, since cdf2e9419dd9 the driver is leaking resources if
ipa_modem_stop() fails. I call that a bug.

> > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> > ---
> >   drivers/net/ipa/ipa_main.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c
> > index da853353a5c7..60e4f590f5de 100644
> > --- a/drivers/net/ipa/ipa_main.c
> > +++ b/drivers/net/ipa/ipa_main.c
> > @@ -960,7 +960,8 @@ static int ipa_remove(struct platform_device *pdev)
> >   			ret = ipa_modem_stop(ipa);
> >   		}
> >   		if (ret)
> > -			return ret;
> > +			dev_err(dev, "Failed to stop modem (%pe)\n",
> > +				ERR_PTR(ret));
> 
> I think this is not correct, or rather, I think it is less
> correct than returning early.
> 
> What's happening here is we're trying to stop the modem.
> It is an external entity that might have some in-flight
> activity that could include "owning" some buffers provided
> by Linux, to be filled with received data.  There's a
> chance that cleaning up (with the call to ipa_teardown())
> can do the right thing, but I'm not going to sign off on
> this until I've looked at that in closer detail.
> 
> This is something that *could* happen but is not *expected*
> to happen.  We expect stopping the modem to succeed so if
> it doesn't, something's wrong and it's not 100% clear how
> to properly handle it.

Returning early is wrong for sure. You skip for example the free_irq
step, so the irq stays active, it might trigger and use the unbound
device. And as the device is unbound, .remove() is never retried and the
irq (and all the other resources) are never freed.

Take the time you need to review the changes. If you don't want to
accept the change now, I'd like to apply the following change:

diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c
index da853353a5c7..f77decf0b1e4 100644
--- a/drivers/net/ipa/ipa_main.c
+++ b/drivers/net/ipa/ipa_main.c
@@ -959,8 +959,11 @@ static int ipa_remove(struct platform_device *pdev)
 			usleep_range(USEC_PER_MSEC, 2 * USEC_PER_MSEC);
 			ret = ipa_modem_stop(ipa);
 		}
-		if (ret)
-			return ret;
+		if (ret) {
+			dev_err(dev, "Failed to stop modem (%pe)\n",
+				ERR_PTR(ret));
+			return 0;
+		}
 
 		ipa_teardown(ipa);
 	}

instead. This introduces no change in behaviour apart from improving the
error message and allows me to continue with my quest to make .remove
return void.

Best regards
Uwe
Alex Elder Nov. 17, 2023, 7:43 p.m. UTC | #3
On 11/17/23 8:45 AM, Uwe Kleine-König wrote:
>>> Fixes: cdf2e9419dd9 ("soc: qcom: ipa: main code")
>> Is this really a bug fix?  This code was doing the right
>> thing even if the caller was not.
> Yes, since cdf2e9419dd9 the driver is leaking resources if
> ipa_modem_stop() fails. I call that a bug.

I understand that.  But the alternative is that we free
those resources and allow the hardware to (eventually)
complete an in-flight operation that touches one of those
resources, which is a use-after-free (rather than a leak),
and I call that a bug too.

The function was returning an error to the caller to
indicate the request failed.  I'm comfortable with
accepting that and just issuing a warning and returning
no error.

The reason I wanted more time to review was that I want to
walk through that code path again and decide which of the
bugs I'd rather keep--and I think it would be the resource
leak.

It's also possible the cleanup function can preclude any
later completion from causing a problem, which would be
the best.  I just don't remember without looking at it
again closely.

					-Alex
Alex Elder Nov. 17, 2023, 8:50 p.m. UTC | #4
On 11/17/23 3:59 AM, Uwe Kleine-König wrote:
> Returning early from .remove() with an error code still results in the
> driver unbinding the device. So the driver core ignores the returned error
> code and the resources that were not freed are never catched up. In
> combination with devm this also often results in use-after-free bugs.
> 
> Here even if the modem cannot be stopped, resources must be freed. So
> replace the early error return by an error message an continue to clean up.
> 
> This prepares changing ipa_remove() to return void.
> 
> Fixes: cdf2e9419dd9 ("soc: qcom: ipa: main code")
> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> ---
>   drivers/net/ipa/ipa_main.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c
> index da853353a5c7..60e4f590f5de 100644
> --- a/drivers/net/ipa/ipa_main.c
> +++ b/drivers/net/ipa/ipa_main.c
> @@ -960,7 +960,8 @@ static int ipa_remove(struct platform_device *pdev)
>   			ret = ipa_modem_stop(ipa);

This ipa_modem_stop() call is the second one in this function,
which is called if the first attempt returned an error.  The
only error that's returned is -EBUSY, which occurs if a request
to stop the modem arrives at a time when we're in transition.
That is, either we are in the process of starting it up, or
we are in the process of stopping it.  In either case, this
transitional state should come to an end quickly.  This second
call happens after a short delay, giving a chance for the start
or stop that's underway to complete.

If the *second* call returns an error, it's assumed we're stuck
in one of the two transitional states, in which case something is
wrong with the hardware (we've issued a request that did not
complete, for example).  And in that case, we have no way of
knowing whether the hardware will come alive and do something
with a resource that's been allocated for it.

I think I'd rather live with whatever resource leaks occur in
such an unlikely case, rather than free them without knowing
what the (broken) hardware is going to do.

>   		}
>   		if (ret)
> -			return ret;
> +			dev_err(dev, "Failed to stop modem (%pe)\n",
> +				ERR_PTR(ret));

By the above reasoning, I'd rather your patch result in the code
looking like this:

	if (ret) {
		dev_err(dev, "remove: error %d stopping modem\n", ret);
		return ret;		/* XXX Later: just return; */
	}

The message is more consistent with the way other messages in the
driver are written.  If %pe is preferred I'd rather make that change
comprehensively throughout the driver rather than bit-by-bit.

Thanks.

					-Alex

>   		ipa_teardown(ipa);
>   	}
diff mbox series

Patch

diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c
index da853353a5c7..60e4f590f5de 100644
--- a/drivers/net/ipa/ipa_main.c
+++ b/drivers/net/ipa/ipa_main.c
@@ -960,7 +960,8 @@  static int ipa_remove(struct platform_device *pdev)
 			ret = ipa_modem_stop(ipa);
 		}
 		if (ret)
-			return ret;
+			dev_err(dev, "Failed to stop modem (%pe)\n",
+				ERR_PTR(ret));
 
 		ipa_teardown(ipa);
 	}