Message ID | YGFulvAa5Kz6HTsd@mwanda (mailing list archive) |
---|---|
State | Accepted |
Commit | bec4d7c93afc07dd0454ae41c559513f858cfb83 |
Headers | show |
Series | [1/2] thunderbolt: Fix a leak in tb_retimer_add() | expand |
On Mon, Mar 29, 2021 at 09:07:18AM +0300, Dan Carpenter wrote: > After the device_register() succeeds, then the correct way to clean up > is to call device_unregister(). The unregister calls both device_del() > and device_put(). Since this code was only device_del() it results in > a memory leak. > > Fixes: dacb12877d92 ("thunderbolt: Add support for on-board retimers") > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> > --- > This is from a new static checker warning. Not tested. With new > warnings it's also possible that I have misunderstood something > fundamental so review carefully etc. It looks OK to me Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> This also highlights the code has an ordering issue too, it calls device_register() then goes to do tb_retimer_nvm_add() however device_register() makes sysfs attributes visible before the rt->nvm is initialized and this: static ssize_t nvm_authenticate_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { if (!rt->nvm) { Isn't strong enough to close the potential racing. The nvm should be setup before device_register and all the above tests in the sysfs deleted so we can rely on the CPU barriers built into device_register() for correctness. [which is a general tip, be very suspicious if device_register() is being error unwound] Jason
Hi, On Mon, Mar 29, 2021 at 10:02:20AM -0300, Jason Gunthorpe wrote: > On Mon, Mar 29, 2021 at 09:07:18AM +0300, Dan Carpenter wrote: > > After the device_register() succeeds, then the correct way to clean up > > is to call device_unregister(). The unregister calls both device_del() > > and device_put(). Since this code was only device_del() it results in > > a memory leak. > > > > Fixes: dacb12877d92 ("thunderbolt: Add support for on-board retimers") > > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> > > --- > > This is from a new static checker warning. Not tested. With new > > warnings it's also possible that I have misunderstood something > > fundamental so review carefully etc. > > It looks OK to me I agree too. > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Thanks for the review! > This also highlights the code has an ordering issue too, it calls > device_register() then goes to do tb_retimer_nvm_add() however > device_register() makes sysfs attributes visible before the rt->nvm is > initialized and this: > > static ssize_t nvm_authenticate_store(struct device *dev, > struct device_attribute *attr, const char *buf, size_t count) > { > if (!rt->nvm) { > > Isn't strong enough to close the potential racing. The nvm should be > setup before device_register and all the above tests in the sysfs > deleted so we can rely on the CPU barriers built into > device_register() for correctness. > > [which is a general tip, be very suspicious if device_register() is > being error unwound] The nvm is a separate (physical Linux) device that gets added under this one. It cannot be added before AFAICT. The code you refer actually looks like this: static ssize_t nvm_authenticate_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { ... if (!mutex_trylock(&rt->tb->lock)) { ret = restart_syscall(); goto exit_rpm; } if (!rt->nvm) { ret = -EAGAIN; goto exit_unlock; } Idea here is that if the NVMem (nvm) is not yet registered the attribute is there but we return -EAGAIN to the userspace.
On Mon, Mar 29, 2021 at 05:43:23PM +0300, Mika Westerberg wrote: > The nvm is a separate (physical Linux) device that gets added under this > one. It cannot be added before AFAICT. Hum, yes, but then it is odd that a parent is holding sysfs attributes that refer to a child. > The code you refer actually looks like this: > > static ssize_t nvm_authenticate_store(struct device *dev, > struct device_attribute *attr, const char *buf, size_t count) > { > ... > if (!mutex_trylock(&rt->tb->lock)) { > ret = restart_syscall(); > goto exit_rpm; > } Is that lock held during tb_retimer_nvm_add() I looked for a bit and didn't find something. So someplace more than 4 call site above mandatory locking is being held? static void tb_retimer_remove(struct tb_retimer *rt) { dev_info(&rt->dev, "retimer disconnected\n"); tb_nvm_free(rt->nvm); device_unregister(&rt->dev); } Here too? And this is why it is all trylock because it deadlocks with unregister otherwise? Jason
On Mon, Mar 29, 2021 at 11:54:05AM -0300, Jason Gunthorpe wrote: > On Mon, Mar 29, 2021 at 05:43:23PM +0300, Mika Westerberg wrote: > > > The nvm is a separate (physical Linux) device that gets added under this > > one. It cannot be added before AFAICT. > > Hum, yes, but then it is odd that a parent is holding sysfs attributes > that refer to a child. Well the child (NVMem) comes from completely different subsystem that does not have a concept of "authentication" or anythin similar. This is what we add on top. We actually exposer two NVMem devices under each retimer: one that is the current active one, and then the one that is used to write the new firmware image. > > The code you refer actually looks like this: > > > > static ssize_t nvm_authenticate_store(struct device *dev, > > struct device_attribute *attr, const char *buf, size_t count) > > { > > ... > > if (!mutex_trylock(&rt->tb->lock)) { > > ret = restart_syscall(); > > goto exit_rpm; > > } > > Is that lock held during tb_retimer_nvm_add() I looked for a bit and > didn't find something. So someplace more than 4 call site above > mandatory locking is being held? Yes it is. It is called from tb_scan_port() where that lock is held. > static void tb_retimer_remove(struct tb_retimer *rt) > { > dev_info(&rt->dev, "retimer disconnected\n"); > tb_nvm_free(rt->nvm); > device_unregister(&rt->dev); > } > > Here too? Yes. > And this is why it is all trylock because it deadlocks with unregister > otherwise? I tried to explain it in 09f11b6c99fe ("thunderbolt: Take domain lock in switch sysfs attribute callbacks"), except that at that time we did not have retimers exposed but the same applies here too.
On Mon, Mar 29, 2021 at 09:07:18AM +0300, Dan Carpenter wrote: > After the device_register() succeeds, then the correct way to clean up > is to call device_unregister(). The unregister calls both device_del() > and device_put(). Since this code was only device_del() it results in > a memory leak. > > Fixes: dacb12877d92 ("thunderbolt: Add support for on-board retimers") > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> > --- > This is from a new static checker warning. Not tested. With new > warnings it's also possible that I have misunderstood something > fundamental so review carefully etc. > > drivers/thunderbolt/retimer.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Both patches applied to thunderbolt.git/fixes. Thanks Dan!
diff --git a/drivers/thunderbolt/retimer.c b/drivers/thunderbolt/retimer.c index 620bcf586ee2..7a5d61604c8b 100644 --- a/drivers/thunderbolt/retimer.c +++ b/drivers/thunderbolt/retimer.c @@ -347,7 +347,7 @@ static int tb_retimer_add(struct tb_port *port, u8 index, u32 auth_status) ret = tb_retimer_nvm_add(rt); if (ret) { dev_err(&rt->dev, "failed to add NVM devices: %d\n", ret); - device_del(&rt->dev); + device_unregister(&rt->dev); return ret; }
After the device_register() succeeds, then the correct way to clean up is to call device_unregister(). The unregister calls both device_del() and device_put(). Since this code was only device_del() it results in a memory leak. Fixes: dacb12877d92 ("thunderbolt: Add support for on-board retimers") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> --- This is from a new static checker warning. Not tested. With new warnings it's also possible that I have misunderstood something fundamental so review carefully etc. drivers/thunderbolt/retimer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)