diff mbox series

net: ethernet: nxp: Fix a possible memory leak in lpc_mii_probe()

Message ID 20240909092948.1118381-1-ruanjinjie@huawei.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series net: ethernet: nxp: Fix a possible memory leak in lpc_mii_probe() | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 24 this patch: 24
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Jinjie Ruan Sept. 9, 2024, 9:29 a.m. UTC
of_phy_find_device() calls bus_find_device(), which calls get_device()
on the returned struct device * to increment the refcount. The current
implementation does not decrement the refcount, which causes memory leak.

So add the missing phy_device_free() call to decrement the
refcount via put_device() to balance the refcount.

Fixes: 3503bf024b3e ("net: lpc_eth: parse phy nodes from device tree")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 drivers/net/ethernet/nxp/lpc_eth.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Andrew Lunn Sept. 9, 2024, 12:54 p.m. UTC | #1
On Mon, Sep 09, 2024 at 05:29:48PM +0800, Jinjie Ruan wrote:
> of_phy_find_device() calls bus_find_device(), which calls get_device()
> on the returned struct device * to increment the refcount. The current
> implementation does not decrement the refcount, which causes memory leak.
> 
> So add the missing phy_device_free() call to decrement the
> refcount via put_device() to balance the refcount.

Why is a device reference counted?

To stop is disappearing.

> @@ -768,6 +768,9 @@ static int lpc_mii_probe(struct net_device *ndev)
>  		return -ENODEV;
>  	}
>  
> +	if (pldat->phy_node)
> +		phy_device_free(phydev);
> +
>  	phydev = phy_connect(ndev, phydev_name(phydev),
>  			     &lpc_handle_link_change,
>  			     lpc_phy_interface_mode(&pldat->pdev->dev));

Think about this code. We use of_phy_find_device to get the device,
taking a reference on it. While we hold that reference, we know it
cannot disappear and we passed it to phy_connect(), passing it into
the phylib layer. Deep down, phy_attach_direct() is called which does
a get_device() taking a reference on the device. That is the phylib
layer saying it is using it, it does not want it to disappear.

Now think about your change. As soon as you new phy_device_free() is
called, the device can disappear. phylib is then going to use
something which has gone. Bad things will happen.

So with changes like this, you need to think about lifetimes of things
being protected by a reference count. When has lpc_mii_probe(), or the
lpc driver as a whole finished with phydev? There are two obvious
alternatives i can think of.

1) It wants to keep hold of the reference until the driver remove() is
called, so you should be releasing the reference in
lpc_eth_drv_remove().

2) Once the phydev is passed to the phylib layer for it to manage,
this driver does not need to care about it any more. So it just needs
to hold the reference until after phy_connect() returns.

Memory leaks are an annoyance, but generally have little effect,
especially in probe/remove code which gets called once. Accessing
something which has gone is going to cause an Opps.

So, you need to think about the lifetime of objects you are
manipulating the reference counts on. You want to state in the commit
message your understanding of these lifetimes so the reviewer can
sanity check them.

FYI: Ignore anything you have learned while fixing device tree
reference counting bugs. Lifetimes of OF objects is probably very
broken.

	Andrew

---
pw-bot: cr
Jinjie Ruan Sept. 9, 2024, 1:17 p.m. UTC | #2
On 2024/9/9 20:54, Andrew Lunn wrote:
> On Mon, Sep 09, 2024 at 05:29:48PM +0800, Jinjie Ruan wrote:
>> of_phy_find_device() calls bus_find_device(), which calls get_device()
>> on the returned struct device * to increment the refcount. The current
>> implementation does not decrement the refcount, which causes memory leak.
>>
>> So add the missing phy_device_free() call to decrement the
>> refcount via put_device() to balance the refcount.
> 
> Why is a device reference counted?
> 
> To stop is disappearing.
> 
>> @@ -768,6 +768,9 @@ static int lpc_mii_probe(struct net_device *ndev)
>>  		return -ENODEV;
>>  	}
>>  
>> +	if (pldat->phy_node)
>> +		phy_device_free(phydev);
>> +
>>  	phydev = phy_connect(ndev, phydev_name(phydev),
>>  			     &lpc_handle_link_change,
>>  			     lpc_phy_interface_mode(&pldat->pdev->dev));
> 
> Think about this code. We use of_phy_find_device to get the device,
> taking a reference on it. While we hold that reference, we know it
> cannot disappear and we passed it to phy_connect(), passing it into
> the phylib layer. Deep down, phy_attach_direct() is called which does
> a get_device() taking a reference on the device. That is the phylib
> layer saying it is using it, it does not want it to disappear.
> 
> Now think about your change. As soon as you new phy_device_free() is
> called, the device can disappear. phylib is then going to use
> something which has gone. Bad things will happen.

Hi, Andrew,
Thank you to share me your a wealth of relevant knowledge. My knowledge
of reference count is relatively shallow.

> 
> So with changes like this, you need to think about lifetimes of things
> being protected by a reference count. When has lpc_mii_probe(), or the
> lpc driver as a whole finished with phydev? There are two obvious
> alternatives i can think of.
> 
> 1) It wants to keep hold of the reference until the driver remove() is
> called, so you should be releasing the reference in
> lpc_eth_drv_remove().
> 
> 2) Once the phydev is passed to the phylib layer for it to manage,
> this driver does not need to care about it any more. So it just needs
> to hold the reference until after phy_connect() returns.

I think this is a good chance to put the device after phy_connect().

> 
> Memory leaks are an annoyance, but generally have little effect,
> especially in probe/remove code which gets called once. Accessing
> something which has gone is going to cause an Opps.
> 
> So, you need to think about the lifetime of objects you are
> manipulating the reference counts on. You want to state in the commit
> message your understanding of these lifetimes so the reviewer can
> sanity check them.>
> FYI: Ignore anything you have learned while fixing device tree
> reference counting bugs. Lifetimes of OF objects is probably very
> broken.
> 
> 	Andrew
> 
> ---
> pw-bot: cr
diff mbox series

Patch

diff --git a/drivers/net/ethernet/nxp/lpc_eth.c b/drivers/net/ethernet/nxp/lpc_eth.c
index dd3e58a1319c..8bff394991e4 100644
--- a/drivers/net/ethernet/nxp/lpc_eth.c
+++ b/drivers/net/ethernet/nxp/lpc_eth.c
@@ -768,6 +768,9 @@  static int lpc_mii_probe(struct net_device *ndev)
 		return -ENODEV;
 	}
 
+	if (pldat->phy_node)
+		phy_device_free(phydev);
+
 	phydev = phy_connect(ndev, phydev_name(phydev),
 			     &lpc_handle_link_change,
 			     lpc_phy_interface_mode(&pldat->pdev->dev));