Message ID | 20221203133159.94414-1-mailhol.vincent@wanadoo.fr (mailing list archive) |
---|---|
Headers | show |
Series | can: usb: remove all usb_set_intfdata(intf, NULL) in drivers' disconnect() | expand |
On 03.12.22 14:31, Vincent Mailhol wrote: > The core sets the usb_interface to NULL in [1]. Also setting it to > NULL in usb_driver::disconnects() is at best useless, at worse risky. Hi, I am afraid there is a major issue with your series of patches. The drivers you are removing this from often have a subsequent check for the data they got from usb_get_intfdata() being NULL. That pattern is taken from drivers like btusb or CDC-ACM, which claim secondary interfaces disconnect() will be called a second time for. In addition, a driver can use setting intfdata to NULL as a flag for disconnect() having proceeded to a point where certain things can no longer be safely done. You need to check for that in every driver you remove this code from and if you decide that it can safely be removed, which is likely, then please also remove checks like this: struct ems_usb *dev = usb_get_intfdata(intf); usb_set_intfdata(intf, NULL); if (dev) { unregister_netdev(dev->netdev); Either it can be called a second time, then you need to leave it as is, or the check for NULL is superfluous. But only removing setting the pointer to NULL never makes sense. Regards Oliver
On Mon. 5 Dec. 2022 at 17:39, Oliver Neukum <oneukum@suse.com> wrote: > On 03.12.22 14:31, Vincent Mailhol wrote: > > The core sets the usb_interface to NULL in [1]. Also setting it to > > NULL in usb_driver::disconnects() is at best useless, at worse risky. > > Hi, > > I am afraid there is a major issue with your series of patches. > The drivers you are removing this from often have a subsequent check > for the data they got from usb_get_intfdata() being NULL. ACK, but I do not see the connection. > That pattern is taken from drivers like btusb or CDC-ACM Where does CDC-ACM set *his* interface to NULL? Looking at: https://elixir.bootlin.com/linux/v6.0/source/drivers/usb/class/cdc-acm.c#L1531 I can see that cdc-acm sets acm->control and acm->data to NULL in his disconnect(), but it doesn't set its own usb_interface to NULL. > which claim secondary interfaces disconnect() will be called a second time > for. Are you saying that the disconnect() of those CAN USB drivers is being called twice? I do not see this in the source code. The only caller of usb_driver::disconnect() I can see is: https://elixir.bootlin.com/linux/v6.0/source/drivers/usb/core/driver.c#L458 > In addition, a driver can use setting intfdata to NULL as a flag > for disconnect() having proceeded to a point where certain things > can no longer be safely done. Any reference that a driver can do that? This pattern seems racy. By the way, I did check all the drivers: * ems_usb: intf is only used in ems_usb_probe() and ems_usb_disconnect() functions. * esd_usb: intf is only used in the esd_usb_probe(), esd_usb_probe_one_net() (which is part of probing), esd_usb_disconnect() and a couple of sysfs functions (which only use intf to get a pointer to struct esd_usb). * gs_usb: intf is used several time but only to retrive struct usb_device. This seems useless, I will sent this patch to remove it: https://lore.kernel.org/linux-can/20221208081142.16936-3-mailhol.vincent@wanadoo.fr/ Aside of that, intf is only used in gs_usb_probe(), gs_make_candev() (which is part of probing) and gs_usb_disconnect() functions. * kvaser_usb: intf is only used in kvaser_usb_probe() and kvaser_usb_disconnect() functions. * mcba_usb: intf is only used in mcba_usb_probe() and mcba_usb_disconnect() functions. * ucan: intf is only used in ucan_probe() and ucan_disconnect(). struct ucan_priv also has a pointer to intf but it is never used. I sent this patch to remove it: https://lore.kernel.org/linux-can/20221208081142.16936-2-mailhol.vincent@wanadoo.fr/ * usb_8dev: intf is only used in usb_8dev_probe() and usb_8dev_disconnect(). With no significant use of intf outside of the probe() and disconnect(), there is definitely no such "use intf as a flag" in any of these drivers. > You need to check for that in every driver > you remove this code from and if you decide that it can safely be removed, What makes you assume that I didn't check this in the first place? Or do you see something I missed? > which is likely, then please also remove checks like this: > > struct ems_usb *dev = usb_get_intfdata(intf); > > usb_set_intfdata(intf, NULL); > > if (dev) { > unregister_netdev(dev->netdev); How is the if (dev) check related? There is no correlation between setting intf to NULL and dev not being NULL. I think dev is never NULL, but I did not assess that dev could not be NULL. > Either it can be called a second time, then you need to leave it > as is, Really?! The first thing disconnect() does is calling usb_get_intfdata(intf) which dereferences intf without checking if it is NULL, c.f.: https://elixir.bootlin.com/linux/v6.0/source/include/linux/usb.h#L265 Then it sets intf to NULL. The second time you call disconnect(), the usb_get_intfdata(intf) would be a NULL pointer dereference. > or the check for NULL is superfluous. But only removing setting > the pointer to NULL never makes sense. Yours sincerely, Vincent Mailhol
On 08.12.22 10:00, Vincent MAILHOL wrote: > On Mon. 5 Dec. 2022 at 17:39, Oliver Neukum <oneukum@suse.com> wrote: >> On 03.12.22 14:31, Vincent Mailhol wrote: Good Morning! > ACK, but I do not see the connection. Well, useless checks are bad. In particular, we should always make it clear whether a pointer may or may not be NULL. That is, I have no problem with what you were trying to do with your patch set. It is a good idea and possibly slightly overdue. The problem is the method. > I can see that cdc-acm sets acm->control and acm->data to NULL in his > disconnect(), but it doesn't set its own usb_interface to NULL. You don't have to, but you can. I was explaining the two patterns for doing so. >> which claim secondary interfaces disconnect() will be called a second time >> for. > > Are you saying that the disconnect() of those CAN USB drivers is being > called twice? I do not see this in the source code. The only caller of > usb_driver::disconnect() I can see is: > > https://elixir.bootlin.com/linux/v6.0/source/drivers/usb/core/driver.c#L458 If they use usb_claim_interface(), yes it is called twice. Once per interface. That is in the case of ACM once for the originally probed interface and a second time for the claimed interface. But not necessarily in that order, as you can be kicked off an interface via sysfs. Yet you need to cease operations as soon as you are disconnected from any interface. That is annoying because it means you cannot use a refcount. From that stems the widespread use of intfdata as a flag. >> In addition, a driver can use setting intfdata to NULL as a flag >> for disconnect() having proceeded to a point where certain things >> can no longer be safely done. > > Any reference that a driver can do that? This pattern seems racy. Technically that is exactly what drivers that use usb_claim_interface() do. You free everything at the first call and use intfdata as a flag to prevent a double free. The race is prevented by usbcore locking, which guarantees that probe() and disconnect() have mutual exclusion. If you use intfdata in sysfs, yes additional locking is needed. > What makes you assume that I didn't check this in the first place? Or > do you see something I missed? That you did not put it into the changelogs. That reads like the drivers are doing something obsolete or stupid. They do not. They copied something that is necessary only under some circumstances. And that you did not remove the checks. >> which is likely, then please also remove checks like this: >> >> struct ems_usb *dev = usb_get_intfdata(intf); >> >> usb_set_intfdata(intf, NULL); >> >> if (dev) { Here. If you have a driver that uses usb_claim_interface(). You need this check or you unregister an already unregistered netdev. The way this disconnect() method is coded is extremely defensive. Most drivers do not need this check. But it is never wrong in the strict sense. Hence doing a mass removal with a change log that does not say that this driver is using only a single interface hence the check can be dropped to reduce code size is not good. Regards Oliver
On Thu. 8 Dec. 2022 at 20:04, Oliver Neukum <oneukum@suse.com> wrote: > On 08.12.22 10:00, Vincent MAILHOL wrote: > > On Mon. 5 Dec. 2022 at 17:39, Oliver Neukum <oneukum@suse.com> wrote: > >> On 03.12.22 14:31, Vincent Mailhol wrote: > > Good Morning! Good night! (different time zone :)) > > ACK, but I do not see the connection. > Well, useless checks are bad. In particular, we should always > make it clear whether a pointer may or may not be NULL. > That is, I have no problem with what you were trying to do > with your patch set. It is a good idea and possibly slightly > overdue. The problem is the method. > > > I can see that cdc-acm sets acm->control and acm->data to NULL in his > > disconnect(), but it doesn't set its own usb_interface to NULL. > > You don't have to, but you can. I was explaining the two patterns for doing so. > > >> which claim secondary interfaces disconnect() will be called a second time > >> for. > > > > Are you saying that the disconnect() of those CAN USB drivers is being > > called twice? I do not see this in the source code. The only caller of > > usb_driver::disconnect() I can see is: > > > > https://elixir.bootlin.com/linux/v6.0/source/drivers/usb/core/driver.c#L458 > > If they use usb_claim_interface(), yes it is called twice. Once per > interface. That is in the case of ACM once for the originally probed > interface and a second time for the claimed interface. > But not necessarily in that order, as you can be kicked off an interface > via sysfs. Yet you need to cease operations as soon as you are disconnected > from any interface. That is annoying because it means you cannot use a > refcount. From that stems the widespread use of intfdata as a flag. Thank you for the details! I better understand this part now. > >> In addition, a driver can use setting intfdata to NULL as a flag > >> for disconnect() having proceeded to a point where certain things > >> can no longer be safely done. > > > > Any reference that a driver can do that? This pattern seems racy. > > Technically that is exactly what drivers that use usb_claim_interface() > do. You free everything at the first call and use intfdata as a flag > to prevent a double free. > The race is prevented by usbcore locking, which guarantees that probe() > and disconnect() have mutual exclusion. > If you use intfdata in sysfs, yes additional locking is needed. ACK for the mutual exclusion. My question was about what you said in your previous message: | In addition, a driver can use setting intfdata to NULL as a flag | for *disconnect() having proceeded to a point* where certain things | can no longer be safely done. How do you check that disconnect() has proceeded *to a given point* using intf without being racy? You can check if it has already completed once but not check how far it has proceeded, right? > > What makes you assume that I didn't check this in the first place? Or > > do you see something I missed? > > That you did not put it into the changelogs. > That reads like the drivers are doing something obsolete or stupid. > They do not. They copied something that is necessary only under > some circumstances. > > And that you did not remove the checks. > > >> which is likely, then please also remove checks like this: > >> > >> struct ems_usb *dev = usb_get_intfdata(intf); > >> > >> usb_set_intfdata(intf, NULL); > >> > >> if (dev) { > > Here. If you have a driver that uses usb_claim_interface(). > You need this check or you unregister an already unregistered > netdev. Sorry, but with all my best intentions, I still do not get it. During the second iteration, inft is NULL and: /* equivalent to dev = intf->dev.data. Because intf is NULL, * this is a NULL pointer dereference */ struct ems_usb *dev = usb_get_intfdata(intf); /* OK, intf is already NULL */ usb_set_intfdata(intf, NULL); /* follows a NULL pointer dereference so this is undefined * behaviour */ if (dev) { How is this a valid check that you entered the function for the second time? If intf is the flag, you should check intf, not dev? Something like this: struct ems_usb *dev; if (!intf) return; dev = usb_get_intfdata(intf); /* ... */ I just can not see the connection between intf being NULL and the if (dev) check. All I see is some undefined behaviour, sorry. > The way this disconnect() method is coded is extremely defensive. > Most drivers do not need this check. But it is never > wrong in the strict sense. > > Hence doing a mass removal with a change log that does > not say that this driver is using only a single interface > hence the check can be dropped to reduce code size > is not good. > > Regards > Oliver
On Fri, Dec 09, 2022 at 12:44:51AM +0900, Vincent MAILHOL wrote: > On Thu. 8 Dec. 2022 at 20:04, Oliver Neukum <oneukum@suse.com> wrote: > > >> which is likely, then please also remove checks like this: > > >> > > >> struct ems_usb *dev = usb_get_intfdata(intf); > > >> > > >> usb_set_intfdata(intf, NULL); > > >> > > >> if (dev) { > > > > Here. If you have a driver that uses usb_claim_interface(). > > You need this check or you unregister an already unregistered > > netdev. > > Sorry, but with all my best intentions, I still do not get it. During > the second iteration, inft is NULL and: No, intf is never NULL. Rather, the driver-specific pointer stored in intfdata may be NULL. You seem to be confusing intf with intfdata(intf). > /* equivalent to dev = intf->dev.data. Because intf is NULL, > * this is a NULL pointer dereference */ > struct ems_usb *dev = usb_get_intfdata(intf); So here dev will be NULL when the second interface's disconnect routine runs, because the first time through the routine sets the intfdata to NULL for both interfaces: USB core calls ->disconnect(intf1) disconnect routine sets intfdata(intf1) and intfdata(intf2) both to NULL and handles the disconnection USB core calls ->disconnect(intf2) disconnect routine sees that intfdata(intf2) is already NULL, so it knows that it doesn't need to do anything more. As you can see in this scenario, neither intf1 nor intf2 is ever NULL. > /* OK, intf is already NULL */ > usb_set_intfdata(intf, NULL); > > /* follows a NULL pointer dereference so this is undefined > * behaviour */ > if (dev) { > > How is this a valid check that you entered the function for the second > time? If intf is the flag, you should check intf, not dev? Something > like this: intf is not a flag; it is the argument to the function and is never NULL. The flag is the intfdata. > struct ems_usb *dev; > > if (!intf) > return; > > dev = usb_get_intfdata(intf); > /* ... */ > > I just can not see the connection between intf being NULL and the if > (dev) check. All I see is some undefined behaviour, sorry. Once you get it straightened out in your head, you will understand. Alan Stern
On 08.12.22 16:44, Vincent MAILHOL wrote: > On Thu. 8 Dec. 2022 at 20:04, Oliver Neukum <oneukum@suse.com> wrote: >> On 08.12.22 10:00, Vincent MAILHOL wrote: >>> On Mon. 5 Dec. 2022 at 17:39, Oliver Neukum <oneukum@suse.com> wrote: >>>> On 03.12.22 14:31, Vincent Mailhol wrote: >> >> Good Morning! > > Good night! (different time zone :)) Good evening! > > How do you check that disconnect() has proceeded *to a given point* > using intf without being racy? You can check if it has already > completed once but not check how far it has proceeded, right? You'd use intfdata, which is a pointer stored in intf. But other than that the simplest way would be to use a mutex. Regards Oliver
Many of the can usb drivers set their driver's priv data to NULL in their disconnect function using below pattern: struct driver_priv *priv = usb_get_intfdata(intf); usb_set_intfdata(intf, NULL); if (priv) /* ... */ The pattern comes from other drivers which have a secondary interface and for which disconnect() may be called twice. However, usb can drivers do not have such secondary interface. On the contrary, if a driver set the driver's priv data to NULL before all actions relying on the interface-data pointer complete, there is a risk of NULL pointer dereference. Typically, this is the case if there are outstanding urbs which have not yet completed when entering disconnect(). Finally, even if there is a valid reason to set the driver's priv data, it would still be useless to do it in usb_driver::disconnect() because the core sets the driver's data to NULL in [1] (and this operation is done in within locked context, so that race conditions should not occur). The first seven patches fix all drivers which set their usb_interface to NULL while outstanding URB might still exists. There is one patch per driver in order to add the relevant "Fixes:" tag to each of them. The eighth patch removes the check toward the driver data being NULL. This just reduces the kernel size so no fixes tag here and all changes are done in bulk in one single patch. Finally, the last patch removes in bulk the remaining benign calls to usb_set_intfdata(intf, NULL) in etas_es58x and peak_usb. N.B. some other usb drivers outside of the can tree also have the same issue, but this is out of scope of this series. [1] function usb_unbind_interface() from drivers/usb/core/driver.c Link: https://elixir.bootlin.com/linux/v6.0/source/drivers/usb/core/driver.c#L497 --- * Changelog * v1 -> v2 * add explanation in the cover letter on the origin of this pattern and why it does not apply to can usb drivers. * v1 claimed that usb_set_intfdata(intf, NULL) sets the usb_interface to NULL. This is incorrect, it is the pointer to driver's private data which set to NULL. Fix this point of confusion in commit message. * add a patch to clean up the useless check on the driver's private data being NULL. Vincent Mailhol (9): can: ems_usb: ems_usb_disconnect(): fix NULL pointer dereference can: esd_usb: esd_usb_disconnect(): fix NULL pointer dereference can: gs_usb: gs_usb_disconnect(): fix NULL pointer dereference can: kvaser_usb: kvaser_usb_disconnect(): fix NULL pointer dereference can: mcba_usb: mcba_usb_disconnect(): fix NULL pointer dereference can: ucan: ucan_disconnect(): fix NULL pointer dereference can: usb_8dev: usb_8dev_disconnect(): fix NULL pointer dereference can: usb: remove useless check on driver data can: etas_es58x and peak_usb: remove useless call to usb_set_intfdata() drivers/net/can/usb/ems_usb.c | 16 ++++++---------- drivers/net/can/usb/esd_usb.c | 18 +++++++----------- drivers/net/can/usb/etas_es58x/es58x_core.c | 1 - drivers/net/can/usb/gs_usb.c | 7 ------- .../net/can/usb/kvaser_usb/kvaser_usb_core.c | 9 +-------- drivers/net/can/usb/mcba_usb.c | 2 -- drivers/net/can/usb/peak_usb/pcan_usb_core.c | 2 -- drivers/net/can/usb/ucan.c | 8 ++------ drivers/net/can/usb/usb_8dev.c | 13 ++++--------- 9 files changed, 20 insertions(+), 56 deletions(-)
Hi, Thanks Alan and Oliver for your patience, really appreciated. And sorry that it took me four messages to realize my mistake. I will send a v2 right now.