diff mbox series

[1/2] thunderbolt: Fix a leak in tb_retimer_add()

Message ID YGFulvAa5Kz6HTsd@mwanda (mailing list archive)
State Accepted
Commit bec4d7c93afc07dd0454ae41c559513f858cfb83
Headers show
Series [1/2] thunderbolt: Fix a leak in tb_retimer_add() | expand

Commit Message

Dan Carpenter March 29, 2021, 6:07 a.m. UTC
After the device_register() succeeds, then the correct way to clean up
is to call device_unregister().  The unregister calls both device_del()
and device_put().  Since this code was only device_del() it results in
a memory leak.

Fixes: dacb12877d92 ("thunderbolt: Add support for on-board retimers")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
---
This is from a new static checker warning.  Not tested.  With new
warnings it's also possible that I have misunderstood something
fundamental so review carefully etc.

 drivers/thunderbolt/retimer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jason Gunthorpe March 29, 2021, 1:02 p.m. UTC | #1
On Mon, Mar 29, 2021 at 09:07:18AM +0300, Dan Carpenter wrote:
> After the device_register() succeeds, then the correct way to clean up
> is to call device_unregister().  The unregister calls both device_del()
> and device_put().  Since this code was only device_del() it results in
> a memory leak.
> 
> Fixes: dacb12877d92 ("thunderbolt: Add support for on-board retimers")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> ---
> This is from a new static checker warning.  Not tested.  With new
> warnings it's also possible that I have misunderstood something
> fundamental so review carefully etc.

It looks OK to me

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

This also highlights the code has an ordering issue too, it calls
device_register() then goes to do tb_retimer_nvm_add() however
device_register() makes sysfs attributes visible before the rt->nvm is
initialized and this:

static ssize_t nvm_authenticate_store(struct device *dev,
	struct device_attribute *attr, const char *buf, size_t count)
{
	if (!rt->nvm) {

Isn't strong enough to close the potential racing. The nvm should be
setup before device_register and all the above tests in the sysfs
deleted so we can rely on the CPU barriers built into
device_register() for correctness.

[which is a general tip, be very suspicious if device_register() is
being error unwound]

Jason
Mika Westerberg March 29, 2021, 2:43 p.m. UTC | #2
Hi,

On Mon, Mar 29, 2021 at 10:02:20AM -0300, Jason Gunthorpe wrote:
> On Mon, Mar 29, 2021 at 09:07:18AM +0300, Dan Carpenter wrote:
> > After the device_register() succeeds, then the correct way to clean up
> > is to call device_unregister().  The unregister calls both device_del()
> > and device_put().  Since this code was only device_del() it results in
> > a memory leak.
> > 
> > Fixes: dacb12877d92 ("thunderbolt: Add support for on-board retimers")
> > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> > ---
> > This is from a new static checker warning.  Not tested.  With new
> > warnings it's also possible that I have misunderstood something
> > fundamental so review carefully etc.
> 
> It looks OK to me

I agree too.

> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Thanks for the review!

> This also highlights the code has an ordering issue too, it calls
> device_register() then goes to do tb_retimer_nvm_add() however
> device_register() makes sysfs attributes visible before the rt->nvm is
> initialized and this:
> 
> static ssize_t nvm_authenticate_store(struct device *dev,
> 	struct device_attribute *attr, const char *buf, size_t count)
> {
> 	if (!rt->nvm) {
> 
> Isn't strong enough to close the potential racing. The nvm should be
> setup before device_register and all the above tests in the sysfs
> deleted so we can rely on the CPU barriers built into
> device_register() for correctness.
> 
> [which is a general tip, be very suspicious if device_register() is
> being error unwound]

The nvm is a separate (physical Linux) device that gets added under this
one. It cannot be added before AFAICT.

The code you refer actually looks like this:

static ssize_t nvm_authenticate_store(struct device *dev,
 	struct device_attribute *attr, const char *buf, size_t count)
{
	...
        if (!mutex_trylock(&rt->tb->lock)) {
                ret = restart_syscall();
                goto exit_rpm;
        }

        if (!rt->nvm) {
                ret = -EAGAIN;
                goto exit_unlock;
        }


Idea here is that if the NVMem (nvm) is not yet registered the attribute is
there but we return -EAGAIN to the userspace.
Jason Gunthorpe March 29, 2021, 2:54 p.m. UTC | #3
On Mon, Mar 29, 2021 at 05:43:23PM +0300, Mika Westerberg wrote:

> The nvm is a separate (physical Linux) device that gets added under this
> one. It cannot be added before AFAICT.

Hum, yes, but then it is odd that a parent is holding sysfs attributes
that refer to a child.

> The code you refer actually looks like this:
> 
> static ssize_t nvm_authenticate_store(struct device *dev,
>  	struct device_attribute *attr, const char *buf, size_t count)
> {
> 	...
>         if (!mutex_trylock(&rt->tb->lock)) {
>                 ret = restart_syscall();
>                 goto exit_rpm;
>         }

Is that lock held during tb_retimer_nvm_add() I looked for a bit and
didn't find something. So someplace more than 4 call site above
mandatory locking is being held?

static void tb_retimer_remove(struct tb_retimer *rt)
{
	dev_info(&rt->dev, "retimer disconnected\n");
	tb_nvm_free(rt->nvm);
	device_unregister(&rt->dev);
}

Here too?

And this is why it is all trylock because it deadlocks with unregister
otherwise?

Jason
Mika Westerberg March 29, 2021, 3:06 p.m. UTC | #4
On Mon, Mar 29, 2021 at 11:54:05AM -0300, Jason Gunthorpe wrote:
> On Mon, Mar 29, 2021 at 05:43:23PM +0300, Mika Westerberg wrote:
> 
> > The nvm is a separate (physical Linux) device that gets added under this
> > one. It cannot be added before AFAICT.
> 
> Hum, yes, but then it is odd that a parent is holding sysfs attributes
> that refer to a child.

Well the child (NVMem) comes from completely different subsystem that
does not have a concept of "authentication" or anythin similar. This is
what we add on top. We actually exposer two NVMem devices under each
retimer: one that is the current active one, and then the one that is
used to write the new firmware image.

> > The code you refer actually looks like this:
> > 
> > static ssize_t nvm_authenticate_store(struct device *dev,
> >  	struct device_attribute *attr, const char *buf, size_t count)
> > {
> > 	...
> >         if (!mutex_trylock(&rt->tb->lock)) {
> >                 ret = restart_syscall();
> >                 goto exit_rpm;
> >         }
> 
> Is that lock held during tb_retimer_nvm_add() I looked for a bit and
> didn't find something. So someplace more than 4 call site above
> mandatory locking is being held?

Yes it is. It is called from tb_scan_port() where that lock is held.

> static void tb_retimer_remove(struct tb_retimer *rt)
> {
> 	dev_info(&rt->dev, "retimer disconnected\n");
> 	tb_nvm_free(rt->nvm);
> 	device_unregister(&rt->dev);
> }
> 
> Here too?

Yes.

> And this is why it is all trylock because it deadlocks with unregister
> otherwise?

I tried to explain it in 09f11b6c99fe ("thunderbolt: Take domain lock in
switch sysfs attribute callbacks"), except that at that time we did not
have retimers exposed but the same applies here too.
Mika Westerberg March 30, 2021, 10:41 a.m. UTC | #5
On Mon, Mar 29, 2021 at 09:07:18AM +0300, Dan Carpenter wrote:
> After the device_register() succeeds, then the correct way to clean up
> is to call device_unregister().  The unregister calls both device_del()
> and device_put().  Since this code was only device_del() it results in
> a memory leak.
> 
> Fixes: dacb12877d92 ("thunderbolt: Add support for on-board retimers")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> ---
> This is from a new static checker warning.  Not tested.  With new
> warnings it's also possible that I have misunderstood something
> fundamental so review carefully etc.
> 
>  drivers/thunderbolt/retimer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Both patches applied to thunderbolt.git/fixes. Thanks Dan!
diff mbox series

Patch

diff --git a/drivers/thunderbolt/retimer.c b/drivers/thunderbolt/retimer.c
index 620bcf586ee2..7a5d61604c8b 100644
--- a/drivers/thunderbolt/retimer.c
+++ b/drivers/thunderbolt/retimer.c
@@ -347,7 +347,7 @@  static int tb_retimer_add(struct tb_port *port, u8 index, u32 auth_status)
 	ret = tb_retimer_nvm_add(rt);
 	if (ret) {
 		dev_err(&rt->dev, "failed to add NVM devices: %d\n", ret);
-		device_del(&rt->dev);
+		device_unregister(&rt->dev);
 		return ret;
 	}