diff mbox series

[for-rc,1/4] IB/hfi1: Call xa_destroy before freeing dummy_netdev

Message ID 1617025700-31865-2-git-send-email-dennis.dalessandro@cornelisnetworks.com (mailing list archive)
State Rejected
Delegated to: Jason Gunthorpe
Headers show
Series hfi fixes | expand

Commit Message

Dennis Dalessandro March 29, 2021, 1:48 p.m. UTC
From: Kaike Wan <kaike.wan@intel.com>

Before the dummy_netdev is freeed, xa_destroy() should be called to
free any internal objects to avoid potential memory leak.

Fixes: 06bde82c72d5 ("IB/hfi1: Add rx functions for dummy netdev")
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
---
 drivers/infiniband/hw/hfi1/netdev_rx.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Jason Gunthorpe March 29, 2021, 2:09 p.m. UTC | #1
On Mon, Mar 29, 2021 at 09:48:17AM -0400, dennis.dalessandro@cornelisnetworks.com wrote:

> diff --git a/drivers/infiniband/hw/hfi1/netdev_rx.c b/drivers/infiniband/hw/hfi1/netdev_rx.c
> index 2c8bc02..cec02e8 100644
> +++ b/drivers/infiniband/hw/hfi1/netdev_rx.c
> @@ -372,7 +372,11 @@ int hfi1_netdev_alloc(struct hfi1_devdata *dd)
>  void hfi1_netdev_free(struct hfi1_devdata *dd)
>  {
>  	if (dd->dummy_netdev) {
> +		struct hfi1_netdev_priv *priv =
> +			hfi1_netdev_priv(dd->dummy_netdev);
> +
>  		dd_dev_info(dd, "hfi1 netdev freed\n");
> +		xa_destroy(&priv->dev_tbl);
>  		kfree(dd->dummy_netdev);
>  		dd->dummy_netdev = NULL;

This is doing kfree() on a struct net_device?? Huh?

You should have put this in your own struct and used container_of not
co-oped netdev_priv, then free your own struct.

It is a bit weird to see a xa_destroy like this, how did things get ot
the point that no concurrent thread can see the xarray but there is
still stuff stored in it?

And it is weird this is storing two different types in it too, with no
refcounting..

Jason
Dennis Dalessandro March 31, 2021, 7:36 p.m. UTC | #2
On 3/29/2021 10:09 AM, Jason Gunthorpe wrote:
> On Mon, Mar 29, 2021 at 09:48:17AM -0400, dennis.dalessandro@cornelisnetworks.com wrote:
> 
>> diff --git a/drivers/infiniband/hw/hfi1/netdev_rx.c b/drivers/infiniband/hw/hfi1/netdev_rx.c
>> index 2c8bc02..cec02e8 100644
>> +++ b/drivers/infiniband/hw/hfi1/netdev_rx.c
>> @@ -372,7 +372,11 @@ int hfi1_netdev_alloc(struct hfi1_devdata *dd)
>>   void hfi1_netdev_free(struct hfi1_devdata *dd)
>>   {
>>   	if (dd->dummy_netdev) {
>> +		struct hfi1_netdev_priv *priv =
>> +			hfi1_netdev_priv(dd->dummy_netdev);
>> +
>>   		dd_dev_info(dd, "hfi1 netdev freed\n");
>> +		xa_destroy(&priv->dev_tbl);
>>   		kfree(dd->dummy_netdev);
>>   		dd->dummy_netdev = NULL;
> 
> This is doing kfree() on a struct net_device?? Huh?
> 
> You should have put this in your own struct and used container_of not
> co-oped netdev_priv, then free your own struct.
> 
> It is a bit weird to see a xa_destroy like this, how did things get ot
> the point that no concurrent thread can see the xarray but there is
> still stuff stored in it?
> 
> And it is weird this is storing two different types in it too, with no
> refcounting..

We do rework this stuff in the other patch series.

https://patchwork.kernel.org/project/linux-rdma/patch/1617026056-50483-11-git-send-email-dennis.dalessandro@cornelisnetworks.com/

If we fix it up in the for-next series, what should we do about stable?

-Denny
Greg KH April 1, 2021, 6:06 a.m. UTC | #3
On Wed, Mar 31, 2021 at 03:36:14PM -0400, Dennis Dalessandro wrote:
> On 3/29/2021 10:09 AM, Jason Gunthorpe wrote:
> > On Mon, Mar 29, 2021 at 09:48:17AM -0400, dennis.dalessandro@cornelisnetworks.com wrote:
> > 
> > > diff --git a/drivers/infiniband/hw/hfi1/netdev_rx.c b/drivers/infiniband/hw/hfi1/netdev_rx.c
> > > index 2c8bc02..cec02e8 100644
> > > +++ b/drivers/infiniband/hw/hfi1/netdev_rx.c
> > > @@ -372,7 +372,11 @@ int hfi1_netdev_alloc(struct hfi1_devdata *dd)
> > >   void hfi1_netdev_free(struct hfi1_devdata *dd)
> > >   {
> > >   	if (dd->dummy_netdev) {
> > > +		struct hfi1_netdev_priv *priv =
> > > +			hfi1_netdev_priv(dd->dummy_netdev);
> > > +
> > >   		dd_dev_info(dd, "hfi1 netdev freed\n");
> > > +		xa_destroy(&priv->dev_tbl);
> > >   		kfree(dd->dummy_netdev);
> > >   		dd->dummy_netdev = NULL;
> > 
> > This is doing kfree() on a struct net_device?? Huh?
> > 
> > You should have put this in your own struct and used container_of not
> > co-oped netdev_priv, then free your own struct.
> > 
> > It is a bit weird to see a xa_destroy like this, how did things get ot
> > the point that no concurrent thread can see the xarray but there is
> > still stuff stored in it?
> > 
> > And it is weird this is storing two different types in it too, with no
> > refcounting..
> 
> We do rework this stuff in the other patch series.
> 
> https://patchwork.kernel.org/project/linux-rdma/patch/1617026056-50483-11-git-send-email-dennis.dalessandro@cornelisnetworks.com/
> 
> If we fix it up in the for-next series, what should we do about stable?

What does stable matter?  WHy can it not just take the same patches that
end up in Linus's tree?

thanks,

greg k-h
Jason Gunthorpe April 1, 2021, 12:33 p.m. UTC | #4
On Wed, Mar 31, 2021 at 03:36:14PM -0400, Dennis Dalessandro wrote:
> On 3/29/2021 10:09 AM, Jason Gunthorpe wrote:
> > On Mon, Mar 29, 2021 at 09:48:17AM -0400, dennis.dalessandro@cornelisnetworks.com wrote:
> > 
> > > diff --git a/drivers/infiniband/hw/hfi1/netdev_rx.c b/drivers/infiniband/hw/hfi1/netdev_rx.c
> > > index 2c8bc02..cec02e8 100644
> > > +++ b/drivers/infiniband/hw/hfi1/netdev_rx.c
> > > @@ -372,7 +372,11 @@ int hfi1_netdev_alloc(struct hfi1_devdata *dd)
> > >   void hfi1_netdev_free(struct hfi1_devdata *dd)
> > >   {
> > >   	if (dd->dummy_netdev) {
> > > +		struct hfi1_netdev_priv *priv =
> > > +			hfi1_netdev_priv(dd->dummy_netdev);
> > > +
> > >   		dd_dev_info(dd, "hfi1 netdev freed\n");
> > > +		xa_destroy(&priv->dev_tbl);
> > >   		kfree(dd->dummy_netdev);
> > >   		dd->dummy_netdev = NULL;
> > 
> > This is doing kfree() on a struct net_device?? Huh?
> > 
> > You should have put this in your own struct and used container_of not
> > co-oped netdev_priv, then free your own struct.
> > 
> > It is a bit weird to see a xa_destroy like this, how did things get ot
> > the point that no concurrent thread can see the xarray but there is
> > still stuff stored in it?
> > 
> > And it is weird this is storing two different types in it too, with no
> > refcounting..
> 
> We do rework this stuff in the other patch series.
> 
> https://patchwork.kernel.org/project/linux-rdma/patch/1617026056-50483-11-git-send-email-dennis.dalessandro@cornelisnetworks.com/
> 
> If we fix it up in the for-next series, what should we do about stable?

Well, if you are fixing bugs then order it bug fixes first, but this
is tagged for rc and you still need to explain what bug it is actually
fixing.

xa_destroy is not required if the xarray is already empty, so the
commit message at least needs to explain how we get to a point where
it still has something in it.

Jason
Wan, Kaike April 1, 2021, 1:42 p.m. UTC | #5
> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Thursday, April 01, 2021 8:33 AM
> To: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
> Cc: dledford@redhat.com; linux-rdma@vger.kernel.org; Wan, Kaike
> <kaike.wan@intel.com>; stable@vger.kernel.org
> Subject: Re: [PATCH for-rc 1/4] IB/hfi1: Call xa_destroy before freeing
> dummy_netdev
> 
> On Wed, Mar 31, 2021 at 03:36:14PM -0400, Dennis Dalessandro wrote:
> > On 3/29/2021 10:09 AM, Jason Gunthorpe wrote:
> > > On Mon, Mar 29, 2021 at 09:48:17AM -0400,
> dennis.dalessandro@cornelisnetworks.com wrote:
> > >
> > > > diff --git a/drivers/infiniband/hw/hfi1/netdev_rx.c
> > > > b/drivers/infiniband/hw/hfi1/netdev_rx.c
> > > > index 2c8bc02..cec02e8 100644
> > > > +++ b/drivers/infiniband/hw/hfi1/netdev_rx.c
> > > > @@ -372,7 +372,11 @@ int hfi1_netdev_alloc(struct hfi1_devdata *dd)
> > > >   void hfi1_netdev_free(struct hfi1_devdata *dd)
> > > >   {
> > > >   	if (dd->dummy_netdev) {
> > > > +		struct hfi1_netdev_priv *priv =
> > > > +			hfi1_netdev_priv(dd->dummy_netdev);
> > > > +
> > > >   		dd_dev_info(dd, "hfi1 netdev freed\n");
> > > > +		xa_destroy(&priv->dev_tbl);
> > > >   		kfree(dd->dummy_netdev);
> > > >   		dd->dummy_netdev = NULL;
> > >
> > > This is doing kfree() on a struct net_device?? Huh?
> > >
> > > You should have put this in your own struct and used container_of
> > > not co-oped netdev_priv, then free your own struct.
> > >
> > > It is a bit weird to see a xa_destroy like this, how did things get
> > > ot the point that no concurrent thread can see the xarray but there
> > > is still stuff stored in it?
> > >
> > > And it is weird this is storing two different types in it too, with
> > > no refcounting..
> >
> > We do rework this stuff in the other patch series.
> >
> > https://patchwork.kernel.org/project/linux-rdma/patch/1617026056-50483
> > -11-git-send-email-dennis.dalessandro@cornelisnetworks.com/
> >
> > If we fix it up in the for-next series, what should we do about stable?
> 
> Well, if you are fixing bugs then order it bug fixes first, but this is tagged for rc
> and you still need to explain what bug it is actually fixing.
> 
> xa_destroy is not required if the xarray is already empty, so the commit
> message at least needs to explain how we get to a point where it still has
> something in it.
[Wan, Kaike] Shouldn't xa_destroy() always be called during cleanup, just in case that something is left behind?
Check the following:
static void ib_device_release(struct device *device)
{
	....
	xa_destroy(&dev->compat_devs);
	xa_destroy(&dev->client_data);
	kfree_rcu(dev, rcu_head);
}

> 
> Jason
Jason Gunthorpe April 1, 2021, 1:48 p.m. UTC | #6
On Thu, Apr 01, 2021 at 01:42:57PM +0000, Wan, Kaike wrote:

> Shouldn't xa_destroy() always be called during cleanup, just in case
> that something is left behind?

No.

> Check the following:

Since I didn't write a WARN_ON(!xa_empty()) it means they were not
made empty.

IIRC there is some special stuff there with XA_ZERO_ENTRY that causes
it.

Jason
Dennis Dalessandro April 1, 2021, 2:02 p.m. UTC | #7
On 4/1/2021 2:06 AM, Greg KH wrote:
> On Wed, Mar 31, 2021 at 03:36:14PM -0400, Dennis Dalessandro wrote:
>> On 3/29/2021 10:09 AM, Jason Gunthorpe wrote:
>>> On Mon, Mar 29, 2021 at 09:48:17AM -0400, dennis.dalessandro@cornelisnetworks.com wrote:
>>>
>>>> diff --git a/drivers/infiniband/hw/hfi1/netdev_rx.c b/drivers/infiniband/hw/hfi1/netdev_rx.c
>>>> index 2c8bc02..cec02e8 100644
>>>> +++ b/drivers/infiniband/hw/hfi1/netdev_rx.c
>>>> @@ -372,7 +372,11 @@ int hfi1_netdev_alloc(struct hfi1_devdata *dd)
>>>>    void hfi1_netdev_free(struct hfi1_devdata *dd)
>>>>    {
>>>>    	if (dd->dummy_netdev) {
>>>> +		struct hfi1_netdev_priv *priv =
>>>> +			hfi1_netdev_priv(dd->dummy_netdev);
>>>> +
>>>>    		dd_dev_info(dd, "hfi1 netdev freed\n");
>>>> +		xa_destroy(&priv->dev_tbl);
>>>>    		kfree(dd->dummy_netdev);
>>>>    		dd->dummy_netdev = NULL;
>>>
>>> This is doing kfree() on a struct net_device?? Huh?
>>>
>>> You should have put this in your own struct and used container_of not
>>> co-oped netdev_priv, then free your own struct.
>>>
>>> It is a bit weird to see a xa_destroy like this, how did things get ot
>>> the point that no concurrent thread can see the xarray but there is
>>> still stuff stored in it?
>>>
>>> And it is weird this is storing two different types in it too, with no
>>> refcounting..
>>
>> We do rework this stuff in the other patch series.
>>
>> https://patchwork.kernel.org/project/linux-rdma/patch/1617026056-50483-11-git-send-email-dennis.dalessandro@cornelisnetworks.com/
>>
>> If we fix it up in the for-next series, what should we do about stable?
> 
> What does stable matter?  WHy can it not just take the same patches that
> end up in Linus's tree?

Guess it's more of a general question. What is the best way to handle 
things if the code changes drastically in Linus' tree, to the point 
where the bug no longer exists there, but does in stable?

-Denny
Greg KH April 1, 2021, 2:12 p.m. UTC | #8
On Thu, Apr 01, 2021 at 10:02:30AM -0400, Dennis Dalessandro wrote:
> On 4/1/2021 2:06 AM, Greg KH wrote:
> > On Wed, Mar 31, 2021 at 03:36:14PM -0400, Dennis Dalessandro wrote:
> > > On 3/29/2021 10:09 AM, Jason Gunthorpe wrote:
> > > > On Mon, Mar 29, 2021 at 09:48:17AM -0400, dennis.dalessandro@cornelisnetworks.com wrote:
> > > > 
> > > > > diff --git a/drivers/infiniband/hw/hfi1/netdev_rx.c b/drivers/infiniband/hw/hfi1/netdev_rx.c
> > > > > index 2c8bc02..cec02e8 100644
> > > > > +++ b/drivers/infiniband/hw/hfi1/netdev_rx.c
> > > > > @@ -372,7 +372,11 @@ int hfi1_netdev_alloc(struct hfi1_devdata *dd)
> > > > >    void hfi1_netdev_free(struct hfi1_devdata *dd)
> > > > >    {
> > > > >    	if (dd->dummy_netdev) {
> > > > > +		struct hfi1_netdev_priv *priv =
> > > > > +			hfi1_netdev_priv(dd->dummy_netdev);
> > > > > +
> > > > >    		dd_dev_info(dd, "hfi1 netdev freed\n");
> > > > > +		xa_destroy(&priv->dev_tbl);
> > > > >    		kfree(dd->dummy_netdev);
> > > > >    		dd->dummy_netdev = NULL;
> > > > 
> > > > This is doing kfree() on a struct net_device?? Huh?
> > > > 
> > > > You should have put this in your own struct and used container_of not
> > > > co-oped netdev_priv, then free your own struct.
> > > > 
> > > > It is a bit weird to see a xa_destroy like this, how did things get ot
> > > > the point that no concurrent thread can see the xarray but there is
> > > > still stuff stored in it?
> > > > 
> > > > And it is weird this is storing two different types in it too, with no
> > > > refcounting..
> > > 
> > > We do rework this stuff in the other patch series.
> > > 
> > > https://patchwork.kernel.org/project/linux-rdma/patch/1617026056-50483-11-git-send-email-dennis.dalessandro@cornelisnetworks.com/
> > > 
> > > If we fix it up in the for-next series, what should we do about stable?
> > 
> > What does stable matter?  WHy can it not just take the same patches that
> > end up in Linus's tree?
> 
> Guess it's more of a general question. What is the best way to handle things
> if the code changes drastically in Linus' tree, to the point where the bug
> no longer exists there, but does in stable?

Documentation/process/stable-kernel-rules.rst should be your first stop
for stuff like this.  Why not just take those "drastic changes" into the
stable kernel as well?

If for some reason that is impossible, then just email a patch to stable
and document the heck out of why this is not in Linus's tree and what
you have done to ensure that this change is correct.  And get the
maintainer to agree.  And be ready to fix it up again afterward as 90%
of the time we do this, the "new patch" causes problems :)

thanks,

greg k-h
Dennis Dalessandro April 1, 2021, 3 p.m. UTC | #9
On 4/1/2021 10:12 AM, Greg KH wrote:
> On Thu, Apr 01, 2021 at 10:02:30AM -0400, Dennis Dalessandro wrote:
>> On 4/1/2021 2:06 AM, Greg KH wrote:
>>> On Wed, Mar 31, 2021 at 03:36:14PM -0400, Dennis Dalessandro wrote:
>>>> On 3/29/2021 10:09 AM, Jason Gunthorpe wrote:
>>>>> On Mon, Mar 29, 2021 at 09:48:17AM -0400, dennis.dalessandro@cornelisnetworks.com wrote:
>>>>>
>>>>>> diff --git a/drivers/infiniband/hw/hfi1/netdev_rx.c b/drivers/infiniband/hw/hfi1/netdev_rx.c
>>>>>> index 2c8bc02..cec02e8 100644
>>>>>> +++ b/drivers/infiniband/hw/hfi1/netdev_rx.c
>>>>>> @@ -372,7 +372,11 @@ int hfi1_netdev_alloc(struct hfi1_devdata *dd)
>>>>>>     void hfi1_netdev_free(struct hfi1_devdata *dd)
>>>>>>     {
>>>>>>     	if (dd->dummy_netdev) {
>>>>>> +		struct hfi1_netdev_priv *priv =
>>>>>> +			hfi1_netdev_priv(dd->dummy_netdev);
>>>>>> +
>>>>>>     		dd_dev_info(dd, "hfi1 netdev freed\n");
>>>>>> +		xa_destroy(&priv->dev_tbl);
>>>>>>     		kfree(dd->dummy_netdev);
>>>>>>     		dd->dummy_netdev = NULL;
>>>>>
>>>>> This is doing kfree() on a struct net_device?? Huh?
>>>>>
>>>>> You should have put this in your own struct and used container_of not
>>>>> co-oped netdev_priv, then free your own struct.
>>>>>
>>>>> It is a bit weird to see a xa_destroy like this, how did things get ot
>>>>> the point that no concurrent thread can see the xarray but there is
>>>>> still stuff stored in it?
>>>>>
>>>>> And it is weird this is storing two different types in it too, with no
>>>>> refcounting..
>>>>
>>>> We do rework this stuff in the other patch series.
>>>>
>>>> https://patchwork.kernel.org/project/linux-rdma/patch/1617026056-50483-11-git-send-email-dennis.dalessandro@cornelisnetworks.com/
>>>>
>>>> If we fix it up in the for-next series, what should we do about stable?
>>>
>>> What does stable matter?  WHy can it not just take the same patches that
>>> end up in Linus's tree?
>>
>> Guess it's more of a general question. What is the best way to handle things
>> if the code changes drastically in Linus' tree, to the point where the bug
>> no longer exists there, but does in stable?
> 
> Documentation/process/stable-kernel-rules.rst should be your first stop
> for stuff like this.  Why not just take those "drastic changes" into the
> stable kernel as well?

Yep, indeed it was my first stop :) and right at the top, it cannot be 
bigger than 100 lines, must fix only one thing, etc etc. That's what got 
me wondering about all this.

> If for some reason that is impossible, then just email a patch to stable
> and document the heck out of why this is not in Linus's tree and what
> you have done to ensure that this change is correct.  And get the
> maintainer to agree.  And be ready to fix it up again afterward as 90%
> of the time we do this, the "new patch" causes problems :)

Makes total sense. Definitely not the route we want to take, and not 
applicable for this current patch anyway.

Appreciate the advice!

-Denny
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/hfi1/netdev_rx.c b/drivers/infiniband/hw/hfi1/netdev_rx.c
index 2c8bc02..cec02e8 100644
--- a/drivers/infiniband/hw/hfi1/netdev_rx.c
+++ b/drivers/infiniband/hw/hfi1/netdev_rx.c
@@ -372,7 +372,11 @@  int hfi1_netdev_alloc(struct hfi1_devdata *dd)
 void hfi1_netdev_free(struct hfi1_devdata *dd)
 {
 	if (dd->dummy_netdev) {
+		struct hfi1_netdev_priv *priv =
+			hfi1_netdev_priv(dd->dummy_netdev);
+
 		dd_dev_info(dd, "hfi1 netdev freed\n");
+		xa_destroy(&priv->dev_tbl);
 		kfree(dd->dummy_netdev);
 		dd->dummy_netdev = NULL;
 	}