Message ID | 20231117095922.876489-2-u.kleine-koenig@pengutronix.de (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net*: Convert to platform remove callback returning void | expand |
On 11/17/23 3:59 AM, Uwe Kleine-König wrote: > Returning early from .remove() with an error code still results in the > driver unbinding the device. So the driver core ignores the returned error > code and the resources that were not freed are never catched up. In > combination with devm this also often results in use-after-free bugs. > > Here even if the modem cannot be stopped, resources must be freed. So > replace the early error return by an error message an continue to clean up. > > This prepares changing ipa_remove() to return void. > > Fixes: cdf2e9419dd9 ("soc: qcom: ipa: main code") Is this really a bug fix? This code was doing the right thing even if the caller was not. > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> > --- > drivers/net/ipa/ipa_main.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c > index da853353a5c7..60e4f590f5de 100644 > --- a/drivers/net/ipa/ipa_main.c > +++ b/drivers/net/ipa/ipa_main.c > @@ -960,7 +960,8 @@ static int ipa_remove(struct platform_device *pdev) > ret = ipa_modem_stop(ipa); > } > if (ret) > - return ret; > + dev_err(dev, "Failed to stop modem (%pe)\n", > + ERR_PTR(ret)); I think this is not correct, or rather, I think it is less correct than returning early. What's happening here is we're trying to stop the modem. It is an external entity that might have some in-flight activity that could include "owning" some buffers provided by Linux, to be filled with received data. There's a chance that cleaning up (with the call to ipa_teardown()) can do the right thing, but I'm not going to sign off on this until I've looked at that in closer detail. This is something that *could* happen but is not *expected* to happen. We expect stopping the modem to succeed so if it doesn't, something's wrong and it's not 100% clear how to properly handle it. For now... you know a little more about my hesitation, but please wait to commit this change until I've had a chance to spend more time reviewing. -Alex > > ipa_teardown(ipa); > }
Hello Alex, On Fri, Nov 17, 2023 at 08:16:02AM -0600, Alex Elder wrote: > On 11/17/23 3:59 AM, Uwe Kleine-König wrote: > > Returning early from .remove() with an error code still results in the > > driver unbinding the device. So the driver core ignores the returned error > > code and the resources that were not freed are never catched up. In > > combination with devm this also often results in use-after-free bugs. > > > > Here even if the modem cannot be stopped, resources must be freed. So > > replace the early error return by an error message an continue to clean up. > > > > This prepares changing ipa_remove() to return void. > > > > Fixes: cdf2e9419dd9 ("soc: qcom: ipa: main code") > > Is this really a bug fix? This code was doing the right > thing even if the caller was not. Yes, since cdf2e9419dd9 the driver is leaking resources if ipa_modem_stop() fails. I call that a bug. > > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> > > --- > > drivers/net/ipa/ipa_main.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c > > index da853353a5c7..60e4f590f5de 100644 > > --- a/drivers/net/ipa/ipa_main.c > > +++ b/drivers/net/ipa/ipa_main.c > > @@ -960,7 +960,8 @@ static int ipa_remove(struct platform_device *pdev) > > ret = ipa_modem_stop(ipa); > > } > > if (ret) > > - return ret; > > + dev_err(dev, "Failed to stop modem (%pe)\n", > > + ERR_PTR(ret)); > > I think this is not correct, or rather, I think it is less > correct than returning early. > > What's happening here is we're trying to stop the modem. > It is an external entity that might have some in-flight > activity that could include "owning" some buffers provided > by Linux, to be filled with received data. There's a > chance that cleaning up (with the call to ipa_teardown()) > can do the right thing, but I'm not going to sign off on > this until I've looked at that in closer detail. > > This is something that *could* happen but is not *expected* > to happen. We expect stopping the modem to succeed so if > it doesn't, something's wrong and it's not 100% clear how > to properly handle it. Returning early is wrong for sure. You skip for example the free_irq step, so the irq stays active, it might trigger and use the unbound device. And as the device is unbound, .remove() is never retried and the irq (and all the other resources) are never freed. Take the time you need to review the changes. If you don't want to accept the change now, I'd like to apply the following change: diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c index da853353a5c7..f77decf0b1e4 100644 --- a/drivers/net/ipa/ipa_main.c +++ b/drivers/net/ipa/ipa_main.c @@ -959,8 +959,11 @@ static int ipa_remove(struct platform_device *pdev) usleep_range(USEC_PER_MSEC, 2 * USEC_PER_MSEC); ret = ipa_modem_stop(ipa); } - if (ret) - return ret; + if (ret) { + dev_err(dev, "Failed to stop modem (%pe)\n", + ERR_PTR(ret)); + return 0; + } ipa_teardown(ipa); } instead. This introduces no change in behaviour apart from improving the error message and allows me to continue with my quest to make .remove return void. Best regards Uwe
On 11/17/23 8:45 AM, Uwe Kleine-König wrote: >>> Fixes: cdf2e9419dd9 ("soc: qcom: ipa: main code") >> Is this really a bug fix? This code was doing the right >> thing even if the caller was not. > Yes, since cdf2e9419dd9 the driver is leaking resources if > ipa_modem_stop() fails. I call that a bug. I understand that. But the alternative is that we free those resources and allow the hardware to (eventually) complete an in-flight operation that touches one of those resources, which is a use-after-free (rather than a leak), and I call that a bug too. The function was returning an error to the caller to indicate the request failed. I'm comfortable with accepting that and just issuing a warning and returning no error. The reason I wanted more time to review was that I want to walk through that code path again and decide which of the bugs I'd rather keep--and I think it would be the resource leak. It's also possible the cleanup function can preclude any later completion from causing a problem, which would be the best. I just don't remember without looking at it again closely. -Alex
On 11/17/23 3:59 AM, Uwe Kleine-König wrote: > Returning early from .remove() with an error code still results in the > driver unbinding the device. So the driver core ignores the returned error > code and the resources that were not freed are never catched up. In > combination with devm this also often results in use-after-free bugs. > > Here even if the modem cannot be stopped, resources must be freed. So > replace the early error return by an error message an continue to clean up. > > This prepares changing ipa_remove() to return void. > > Fixes: cdf2e9419dd9 ("soc: qcom: ipa: main code") > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> > --- > drivers/net/ipa/ipa_main.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c > index da853353a5c7..60e4f590f5de 100644 > --- a/drivers/net/ipa/ipa_main.c > +++ b/drivers/net/ipa/ipa_main.c > @@ -960,7 +960,8 @@ static int ipa_remove(struct platform_device *pdev) > ret = ipa_modem_stop(ipa); This ipa_modem_stop() call is the second one in this function, which is called if the first attempt returned an error. The only error that's returned is -EBUSY, which occurs if a request to stop the modem arrives at a time when we're in transition. That is, either we are in the process of starting it up, or we are in the process of stopping it. In either case, this transitional state should come to an end quickly. This second call happens after a short delay, giving a chance for the start or stop that's underway to complete. If the *second* call returns an error, it's assumed we're stuck in one of the two transitional states, in which case something is wrong with the hardware (we've issued a request that did not complete, for example). And in that case, we have no way of knowing whether the hardware will come alive and do something with a resource that's been allocated for it. I think I'd rather live with whatever resource leaks occur in such an unlikely case, rather than free them without knowing what the (broken) hardware is going to do. > } > if (ret) > - return ret; > + dev_err(dev, "Failed to stop modem (%pe)\n", > + ERR_PTR(ret)); By the above reasoning, I'd rather your patch result in the code looking like this: if (ret) { dev_err(dev, "remove: error %d stopping modem\n", ret); return ret; /* XXX Later: just return; */ } The message is more consistent with the way other messages in the driver are written. If %pe is preferred I'd rather make that change comprehensively throughout the driver rather than bit-by-bit. Thanks. -Alex > ipa_teardown(ipa); > }
diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c index da853353a5c7..60e4f590f5de 100644 --- a/drivers/net/ipa/ipa_main.c +++ b/drivers/net/ipa/ipa_main.c @@ -960,7 +960,8 @@ static int ipa_remove(struct platform_device *pdev) ret = ipa_modem_stop(ipa); } if (ret) - return ret; + dev_err(dev, "Failed to stop modem (%pe)\n", + ERR_PTR(ret)); ipa_teardown(ipa); }
Returning early from .remove() with an error code still results in the driver unbinding the device. So the driver core ignores the returned error code and the resources that were not freed are never catched up. In combination with devm this also often results in use-after-free bugs. Here even if the modem cannot be stopped, resources must be freed. So replace the early error return by an error message an continue to clean up. This prepares changing ipa_remove() to return void. Fixes: cdf2e9419dd9 ("soc: qcom: ipa: main code") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> --- drivers/net/ipa/ipa_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)