Message ID | 1617025700-31865-2-git-send-email-dennis.dalessandro@cornelisnetworks.com (mailing list archive) |
---|---|
State | Rejected |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | hfi fixes | expand |
On Mon, Mar 29, 2021 at 09:48:17AM -0400, dennis.dalessandro@cornelisnetworks.com wrote: > diff --git a/drivers/infiniband/hw/hfi1/netdev_rx.c b/drivers/infiniband/hw/hfi1/netdev_rx.c > index 2c8bc02..cec02e8 100644 > +++ b/drivers/infiniband/hw/hfi1/netdev_rx.c > @@ -372,7 +372,11 @@ int hfi1_netdev_alloc(struct hfi1_devdata *dd) > void hfi1_netdev_free(struct hfi1_devdata *dd) > { > if (dd->dummy_netdev) { > + struct hfi1_netdev_priv *priv = > + hfi1_netdev_priv(dd->dummy_netdev); > + > dd_dev_info(dd, "hfi1 netdev freed\n"); > + xa_destroy(&priv->dev_tbl); > kfree(dd->dummy_netdev); > dd->dummy_netdev = NULL; This is doing kfree() on a struct net_device?? Huh? You should have put this in your own struct and used container_of not co-oped netdev_priv, then free your own struct. It is a bit weird to see a xa_destroy like this, how did things get ot the point that no concurrent thread can see the xarray but there is still stuff stored in it? And it is weird this is storing two different types in it too, with no refcounting.. Jason
On 3/29/2021 10:09 AM, Jason Gunthorpe wrote: > On Mon, Mar 29, 2021 at 09:48:17AM -0400, dennis.dalessandro@cornelisnetworks.com wrote: > >> diff --git a/drivers/infiniband/hw/hfi1/netdev_rx.c b/drivers/infiniband/hw/hfi1/netdev_rx.c >> index 2c8bc02..cec02e8 100644 >> +++ b/drivers/infiniband/hw/hfi1/netdev_rx.c >> @@ -372,7 +372,11 @@ int hfi1_netdev_alloc(struct hfi1_devdata *dd) >> void hfi1_netdev_free(struct hfi1_devdata *dd) >> { >> if (dd->dummy_netdev) { >> + struct hfi1_netdev_priv *priv = >> + hfi1_netdev_priv(dd->dummy_netdev); >> + >> dd_dev_info(dd, "hfi1 netdev freed\n"); >> + xa_destroy(&priv->dev_tbl); >> kfree(dd->dummy_netdev); >> dd->dummy_netdev = NULL; > > This is doing kfree() on a struct net_device?? Huh? > > You should have put this in your own struct and used container_of not > co-oped netdev_priv, then free your own struct. > > It is a bit weird to see a xa_destroy like this, how did things get ot > the point that no concurrent thread can see the xarray but there is > still stuff stored in it? > > And it is weird this is storing two different types in it too, with no > refcounting.. We do rework this stuff in the other patch series. https://patchwork.kernel.org/project/linux-rdma/patch/1617026056-50483-11-git-send-email-dennis.dalessandro@cornelisnetworks.com/ If we fix it up in the for-next series, what should we do about stable? -Denny
On Wed, Mar 31, 2021 at 03:36:14PM -0400, Dennis Dalessandro wrote: > On 3/29/2021 10:09 AM, Jason Gunthorpe wrote: > > On Mon, Mar 29, 2021 at 09:48:17AM -0400, dennis.dalessandro@cornelisnetworks.com wrote: > > > > > diff --git a/drivers/infiniband/hw/hfi1/netdev_rx.c b/drivers/infiniband/hw/hfi1/netdev_rx.c > > > index 2c8bc02..cec02e8 100644 > > > +++ b/drivers/infiniband/hw/hfi1/netdev_rx.c > > > @@ -372,7 +372,11 @@ int hfi1_netdev_alloc(struct hfi1_devdata *dd) > > > void hfi1_netdev_free(struct hfi1_devdata *dd) > > > { > > > if (dd->dummy_netdev) { > > > + struct hfi1_netdev_priv *priv = > > > + hfi1_netdev_priv(dd->dummy_netdev); > > > + > > > dd_dev_info(dd, "hfi1 netdev freed\n"); > > > + xa_destroy(&priv->dev_tbl); > > > kfree(dd->dummy_netdev); > > > dd->dummy_netdev = NULL; > > > > This is doing kfree() on a struct net_device?? Huh? > > > > You should have put this in your own struct and used container_of not > > co-oped netdev_priv, then free your own struct. > > > > It is a bit weird to see a xa_destroy like this, how did things get ot > > the point that no concurrent thread can see the xarray but there is > > still stuff stored in it? > > > > And it is weird this is storing two different types in it too, with no > > refcounting.. > > We do rework this stuff in the other patch series. > > https://patchwork.kernel.org/project/linux-rdma/patch/1617026056-50483-11-git-send-email-dennis.dalessandro@cornelisnetworks.com/ > > If we fix it up in the for-next series, what should we do about stable? What does stable matter? WHy can it not just take the same patches that end up in Linus's tree? thanks, greg k-h
On Wed, Mar 31, 2021 at 03:36:14PM -0400, Dennis Dalessandro wrote: > On 3/29/2021 10:09 AM, Jason Gunthorpe wrote: > > On Mon, Mar 29, 2021 at 09:48:17AM -0400, dennis.dalessandro@cornelisnetworks.com wrote: > > > > > diff --git a/drivers/infiniband/hw/hfi1/netdev_rx.c b/drivers/infiniband/hw/hfi1/netdev_rx.c > > > index 2c8bc02..cec02e8 100644 > > > +++ b/drivers/infiniband/hw/hfi1/netdev_rx.c > > > @@ -372,7 +372,11 @@ int hfi1_netdev_alloc(struct hfi1_devdata *dd) > > > void hfi1_netdev_free(struct hfi1_devdata *dd) > > > { > > > if (dd->dummy_netdev) { > > > + struct hfi1_netdev_priv *priv = > > > + hfi1_netdev_priv(dd->dummy_netdev); > > > + > > > dd_dev_info(dd, "hfi1 netdev freed\n"); > > > + xa_destroy(&priv->dev_tbl); > > > kfree(dd->dummy_netdev); > > > dd->dummy_netdev = NULL; > > > > This is doing kfree() on a struct net_device?? Huh? > > > > You should have put this in your own struct and used container_of not > > co-oped netdev_priv, then free your own struct. > > > > It is a bit weird to see a xa_destroy like this, how did things get ot > > the point that no concurrent thread can see the xarray but there is > > still stuff stored in it? > > > > And it is weird this is storing two different types in it too, with no > > refcounting.. > > We do rework this stuff in the other patch series. > > https://patchwork.kernel.org/project/linux-rdma/patch/1617026056-50483-11-git-send-email-dennis.dalessandro@cornelisnetworks.com/ > > If we fix it up in the for-next series, what should we do about stable? Well, if you are fixing bugs then order it bug fixes first, but this is tagged for rc and you still need to explain what bug it is actually fixing. xa_destroy is not required if the xarray is already empty, so the commit message at least needs to explain how we get to a point where it still has something in it. Jason
> -----Original Message----- > From: Jason Gunthorpe <jgg@ziepe.ca> > Sent: Thursday, April 01, 2021 8:33 AM > To: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> > Cc: dledford@redhat.com; linux-rdma@vger.kernel.org; Wan, Kaike > <kaike.wan@intel.com>; stable@vger.kernel.org > Subject: Re: [PATCH for-rc 1/4] IB/hfi1: Call xa_destroy before freeing > dummy_netdev > > On Wed, Mar 31, 2021 at 03:36:14PM -0400, Dennis Dalessandro wrote: > > On 3/29/2021 10:09 AM, Jason Gunthorpe wrote: > > > On Mon, Mar 29, 2021 at 09:48:17AM -0400, > dennis.dalessandro@cornelisnetworks.com wrote: > > > > > > > diff --git a/drivers/infiniband/hw/hfi1/netdev_rx.c > > > > b/drivers/infiniband/hw/hfi1/netdev_rx.c > > > > index 2c8bc02..cec02e8 100644 > > > > +++ b/drivers/infiniband/hw/hfi1/netdev_rx.c > > > > @@ -372,7 +372,11 @@ int hfi1_netdev_alloc(struct hfi1_devdata *dd) > > > > void hfi1_netdev_free(struct hfi1_devdata *dd) > > > > { > > > > if (dd->dummy_netdev) { > > > > + struct hfi1_netdev_priv *priv = > > > > + hfi1_netdev_priv(dd->dummy_netdev); > > > > + > > > > dd_dev_info(dd, "hfi1 netdev freed\n"); > > > > + xa_destroy(&priv->dev_tbl); > > > > kfree(dd->dummy_netdev); > > > > dd->dummy_netdev = NULL; > > > > > > This is doing kfree() on a struct net_device?? Huh? > > > > > > You should have put this in your own struct and used container_of > > > not co-oped netdev_priv, then free your own struct. > > > > > > It is a bit weird to see a xa_destroy like this, how did things get > > > ot the point that no concurrent thread can see the xarray but there > > > is still stuff stored in it? > > > > > > And it is weird this is storing two different types in it too, with > > > no refcounting.. > > > > We do rework this stuff in the other patch series. > > > > https://patchwork.kernel.org/project/linux-rdma/patch/1617026056-50483 > > -11-git-send-email-dennis.dalessandro@cornelisnetworks.com/ > > > > If we fix it up in the for-next series, what should we do about stable? > > Well, if you are fixing bugs then order it bug fixes first, but this is tagged for rc > and you still need to explain what bug it is actually fixing. > > xa_destroy is not required if the xarray is already empty, so the commit > message at least needs to explain how we get to a point where it still has > something in it. [Wan, Kaike] Shouldn't xa_destroy() always be called during cleanup, just in case that something is left behind? Check the following: static void ib_device_release(struct device *device) { .... xa_destroy(&dev->compat_devs); xa_destroy(&dev->client_data); kfree_rcu(dev, rcu_head); } > > Jason
On Thu, Apr 01, 2021 at 01:42:57PM +0000, Wan, Kaike wrote: > Shouldn't xa_destroy() always be called during cleanup, just in case > that something is left behind? No. > Check the following: Since I didn't write a WARN_ON(!xa_empty()) it means they were not made empty. IIRC there is some special stuff there with XA_ZERO_ENTRY that causes it. Jason
On 4/1/2021 2:06 AM, Greg KH wrote: > On Wed, Mar 31, 2021 at 03:36:14PM -0400, Dennis Dalessandro wrote: >> On 3/29/2021 10:09 AM, Jason Gunthorpe wrote: >>> On Mon, Mar 29, 2021 at 09:48:17AM -0400, dennis.dalessandro@cornelisnetworks.com wrote: >>> >>>> diff --git a/drivers/infiniband/hw/hfi1/netdev_rx.c b/drivers/infiniband/hw/hfi1/netdev_rx.c >>>> index 2c8bc02..cec02e8 100644 >>>> +++ b/drivers/infiniband/hw/hfi1/netdev_rx.c >>>> @@ -372,7 +372,11 @@ int hfi1_netdev_alloc(struct hfi1_devdata *dd) >>>> void hfi1_netdev_free(struct hfi1_devdata *dd) >>>> { >>>> if (dd->dummy_netdev) { >>>> + struct hfi1_netdev_priv *priv = >>>> + hfi1_netdev_priv(dd->dummy_netdev); >>>> + >>>> dd_dev_info(dd, "hfi1 netdev freed\n"); >>>> + xa_destroy(&priv->dev_tbl); >>>> kfree(dd->dummy_netdev); >>>> dd->dummy_netdev = NULL; >>> >>> This is doing kfree() on a struct net_device?? Huh? >>> >>> You should have put this in your own struct and used container_of not >>> co-oped netdev_priv, then free your own struct. >>> >>> It is a bit weird to see a xa_destroy like this, how did things get ot >>> the point that no concurrent thread can see the xarray but there is >>> still stuff stored in it? >>> >>> And it is weird this is storing two different types in it too, with no >>> refcounting.. >> >> We do rework this stuff in the other patch series. >> >> https://patchwork.kernel.org/project/linux-rdma/patch/1617026056-50483-11-git-send-email-dennis.dalessandro@cornelisnetworks.com/ >> >> If we fix it up in the for-next series, what should we do about stable? > > What does stable matter? WHy can it not just take the same patches that > end up in Linus's tree? Guess it's more of a general question. What is the best way to handle things if the code changes drastically in Linus' tree, to the point where the bug no longer exists there, but does in stable? -Denny
On Thu, Apr 01, 2021 at 10:02:30AM -0400, Dennis Dalessandro wrote: > On 4/1/2021 2:06 AM, Greg KH wrote: > > On Wed, Mar 31, 2021 at 03:36:14PM -0400, Dennis Dalessandro wrote: > > > On 3/29/2021 10:09 AM, Jason Gunthorpe wrote: > > > > On Mon, Mar 29, 2021 at 09:48:17AM -0400, dennis.dalessandro@cornelisnetworks.com wrote: > > > > > > > > > diff --git a/drivers/infiniband/hw/hfi1/netdev_rx.c b/drivers/infiniband/hw/hfi1/netdev_rx.c > > > > > index 2c8bc02..cec02e8 100644 > > > > > +++ b/drivers/infiniband/hw/hfi1/netdev_rx.c > > > > > @@ -372,7 +372,11 @@ int hfi1_netdev_alloc(struct hfi1_devdata *dd) > > > > > void hfi1_netdev_free(struct hfi1_devdata *dd) > > > > > { > > > > > if (dd->dummy_netdev) { > > > > > + struct hfi1_netdev_priv *priv = > > > > > + hfi1_netdev_priv(dd->dummy_netdev); > > > > > + > > > > > dd_dev_info(dd, "hfi1 netdev freed\n"); > > > > > + xa_destroy(&priv->dev_tbl); > > > > > kfree(dd->dummy_netdev); > > > > > dd->dummy_netdev = NULL; > > > > > > > > This is doing kfree() on a struct net_device?? Huh? > > > > > > > > You should have put this in your own struct and used container_of not > > > > co-oped netdev_priv, then free your own struct. > > > > > > > > It is a bit weird to see a xa_destroy like this, how did things get ot > > > > the point that no concurrent thread can see the xarray but there is > > > > still stuff stored in it? > > > > > > > > And it is weird this is storing two different types in it too, with no > > > > refcounting.. > > > > > > We do rework this stuff in the other patch series. > > > > > > https://patchwork.kernel.org/project/linux-rdma/patch/1617026056-50483-11-git-send-email-dennis.dalessandro@cornelisnetworks.com/ > > > > > > If we fix it up in the for-next series, what should we do about stable? > > > > What does stable matter? WHy can it not just take the same patches that > > end up in Linus's tree? > > Guess it's more of a general question. What is the best way to handle things > if the code changes drastically in Linus' tree, to the point where the bug > no longer exists there, but does in stable? Documentation/process/stable-kernel-rules.rst should be your first stop for stuff like this. Why not just take those "drastic changes" into the stable kernel as well? If for some reason that is impossible, then just email a patch to stable and document the heck out of why this is not in Linus's tree and what you have done to ensure that this change is correct. And get the maintainer to agree. And be ready to fix it up again afterward as 90% of the time we do this, the "new patch" causes problems :) thanks, greg k-h
On 4/1/2021 10:12 AM, Greg KH wrote: > On Thu, Apr 01, 2021 at 10:02:30AM -0400, Dennis Dalessandro wrote: >> On 4/1/2021 2:06 AM, Greg KH wrote: >>> On Wed, Mar 31, 2021 at 03:36:14PM -0400, Dennis Dalessandro wrote: >>>> On 3/29/2021 10:09 AM, Jason Gunthorpe wrote: >>>>> On Mon, Mar 29, 2021 at 09:48:17AM -0400, dennis.dalessandro@cornelisnetworks.com wrote: >>>>> >>>>>> diff --git a/drivers/infiniband/hw/hfi1/netdev_rx.c b/drivers/infiniband/hw/hfi1/netdev_rx.c >>>>>> index 2c8bc02..cec02e8 100644 >>>>>> +++ b/drivers/infiniband/hw/hfi1/netdev_rx.c >>>>>> @@ -372,7 +372,11 @@ int hfi1_netdev_alloc(struct hfi1_devdata *dd) >>>>>> void hfi1_netdev_free(struct hfi1_devdata *dd) >>>>>> { >>>>>> if (dd->dummy_netdev) { >>>>>> + struct hfi1_netdev_priv *priv = >>>>>> + hfi1_netdev_priv(dd->dummy_netdev); >>>>>> + >>>>>> dd_dev_info(dd, "hfi1 netdev freed\n"); >>>>>> + xa_destroy(&priv->dev_tbl); >>>>>> kfree(dd->dummy_netdev); >>>>>> dd->dummy_netdev = NULL; >>>>> >>>>> This is doing kfree() on a struct net_device?? Huh? >>>>> >>>>> You should have put this in your own struct and used container_of not >>>>> co-oped netdev_priv, then free your own struct. >>>>> >>>>> It is a bit weird to see a xa_destroy like this, how did things get ot >>>>> the point that no concurrent thread can see the xarray but there is >>>>> still stuff stored in it? >>>>> >>>>> And it is weird this is storing two different types in it too, with no >>>>> refcounting.. >>>> >>>> We do rework this stuff in the other patch series. >>>> >>>> https://patchwork.kernel.org/project/linux-rdma/patch/1617026056-50483-11-git-send-email-dennis.dalessandro@cornelisnetworks.com/ >>>> >>>> If we fix it up in the for-next series, what should we do about stable? >>> >>> What does stable matter? WHy can it not just take the same patches that >>> end up in Linus's tree? >> >> Guess it's more of a general question. What is the best way to handle things >> if the code changes drastically in Linus' tree, to the point where the bug >> no longer exists there, but does in stable? > > Documentation/process/stable-kernel-rules.rst should be your first stop > for stuff like this. Why not just take those "drastic changes" into the > stable kernel as well? Yep, indeed it was my first stop :) and right at the top, it cannot be bigger than 100 lines, must fix only one thing, etc etc. That's what got me wondering about all this. > If for some reason that is impossible, then just email a patch to stable > and document the heck out of why this is not in Linus's tree and what > you have done to ensure that this change is correct. And get the > maintainer to agree. And be ready to fix it up again afterward as 90% > of the time we do this, the "new patch" causes problems :) Makes total sense. Definitely not the route we want to take, and not applicable for this current patch anyway. Appreciate the advice! -Denny
diff --git a/drivers/infiniband/hw/hfi1/netdev_rx.c b/drivers/infiniband/hw/hfi1/netdev_rx.c index 2c8bc02..cec02e8 100644 --- a/drivers/infiniband/hw/hfi1/netdev_rx.c +++ b/drivers/infiniband/hw/hfi1/netdev_rx.c @@ -372,7 +372,11 @@ int hfi1_netdev_alloc(struct hfi1_devdata *dd) void hfi1_netdev_free(struct hfi1_devdata *dd) { if (dd->dummy_netdev) { + struct hfi1_netdev_priv *priv = + hfi1_netdev_priv(dd->dummy_netdev); + dd_dev_info(dd, "hfi1 netdev freed\n"); + xa_destroy(&priv->dev_tbl); kfree(dd->dummy_netdev); dd->dummy_netdev = NULL; }