diff mbox series

[for-next,8/9] RDMA/rxe: Report leaked objects

Message ID 20230721205021.5394-9-rpearsonhpe@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Jason Gunthorpe
Headers show
Series RDMA/rxe: Misc fixes and cleanups | expand

Commit Message

Bob Pearson July 21, 2023, 8:50 p.m. UTC
This patch gives a more detailed list of objects that are not
freed in case of error before the module exits.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

Jason Gunthorpe July 31, 2023, 6:15 p.m. UTC | #1
On Fri, Jul 21, 2023 at 03:50:21PM -0500, Bob Pearson wrote:
> This patch gives a more detailed list of objects that are not
> freed in case of error before the module exits.
> 
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_pool.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
> index cb812bd969c6..3249c2741491 100644
> --- a/drivers/infiniband/sw/rxe/rxe_pool.c
> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
> @@ -113,7 +113,17 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
>  
>  void rxe_pool_cleanup(struct rxe_pool *pool)
>  {
> -	WARN_ON(!xa_empty(&pool->xa));
> +	unsigned long index;
> +	struct rxe_pool_elem *elem;
> +
> +	xa_lock(&pool->xa);
> +	xa_for_each(&pool->xa, index, elem) {
> +		rxe_err_dev(pool->rxe, "%s#%d: Leaked", pool->name,
> +				elem->index);
> +	}
> +	xa_unlock(&pool->xa);
> +
> +	xa_destroy(&pool->xa);
>  }

Is this why? Just count the number of unfinalized objects and report
if there is difference, don't mess up the xarray.

Jason
Bob Pearson July 31, 2023, 6:23 p.m. UTC | #2
On 7/31/23 13:15, Jason Gunthorpe wrote:
> On Fri, Jul 21, 2023 at 03:50:21PM -0500, Bob Pearson wrote:
>> This patch gives a more detailed list of objects that are not
>> freed in case of error before the module exits.
>>
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>> ---
>>  drivers/infiniband/sw/rxe/rxe_pool.c | 12 +++++++++++-
>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
>> index cb812bd969c6..3249c2741491 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_pool.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
>> @@ -113,7 +113,17 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
>>  
>>  void rxe_pool_cleanup(struct rxe_pool *pool)
>>  {
>> -	WARN_ON(!xa_empty(&pool->xa));
>> +	unsigned long index;
>> +	struct rxe_pool_elem *elem;
>> +
>> +	xa_lock(&pool->xa);
>> +	xa_for_each(&pool->xa, index, elem) {
>> +		rxe_err_dev(pool->rxe, "%s#%d: Leaked", pool->name,
>> +				elem->index);
>> +	}
>> +	xa_unlock(&pool->xa);
>> +
>> +	xa_destroy(&pool->xa);
>>  }
> 
> Is this why? Just count the number of unfinalized objects and report
> if there is difference, don't mess up the xarray.
> 
> Jason
This is why I made the last change but I really didn't like that there was no
way to lookup the qp from its index since we were using a NULL xarray entry to
indicate the state of the qp. Making it explicit, i.e. a variable in the struct
seems much more straight forward. Not sure why you hated the last change.

Bob
Jason Gunthorpe July 31, 2023, 6:31 p.m. UTC | #3
On Mon, Jul 31, 2023 at 01:23:59PM -0500, Bob Pearson wrote:
> On 7/31/23 13:15, Jason Gunthorpe wrote:
> > On Fri, Jul 21, 2023 at 03:50:21PM -0500, Bob Pearson wrote:
> >> This patch gives a more detailed list of objects that are not
> >> freed in case of error before the module exits.
> >>
> >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> >> ---
> >>  drivers/infiniband/sw/rxe/rxe_pool.c | 12 +++++++++++-
> >>  1 file changed, 11 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
> >> index cb812bd969c6..3249c2741491 100644
> >> --- a/drivers/infiniband/sw/rxe/rxe_pool.c
> >> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
> >> @@ -113,7 +113,17 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
> >>  
> >>  void rxe_pool_cleanup(struct rxe_pool *pool)
> >>  {
> >> -	WARN_ON(!xa_empty(&pool->xa));
> >> +	unsigned long index;
> >> +	struct rxe_pool_elem *elem;
> >> +
> >> +	xa_lock(&pool->xa);
> >> +	xa_for_each(&pool->xa, index, elem) {
> >> +		rxe_err_dev(pool->rxe, "%s#%d: Leaked", pool->name,
> >> +				elem->index);
> >> +	}
> >> +	xa_unlock(&pool->xa);
> >> +
> >> +	xa_destroy(&pool->xa);
> >>  }
> > 
> > Is this why? Just count the number of unfinalized objects and report
> > if there is difference, don't mess up the xarray.
> > 
> > Jason
> This is why I made the last change but I really didn't like that there was no
> way to lookup the qp from its index since we were using a NULL xarray entry to
> indicate the state of the qp. Making it explicit, i.e. a variable in the struct
> seems much more straight forward. Not sure why you hated the last
> change.

Because if you don't call finalize abort has to be deterministic, and
abort can't be that if it someone else can access the poitner and, eg,
take a reference.

It breaks the entire model of how the memory ownership works during creation.

Jason
Bob Pearson July 31, 2023, 6:42 p.m. UTC | #4
On 7/31/23 13:31, Jason Gunthorpe wrote:
> On Mon, Jul 31, 2023 at 01:23:59PM -0500, Bob Pearson wrote:
>> On 7/31/23 13:15, Jason Gunthorpe wrote:
>>> On Fri, Jul 21, 2023 at 03:50:21PM -0500, Bob Pearson wrote:
>>>> This patch gives a more detailed list of objects that are not
>>>> freed in case of error before the module exits.
>>>>
>>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>>>> ---
>>>>  drivers/infiniband/sw/rxe/rxe_pool.c | 12 +++++++++++-
>>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
>>>> index cb812bd969c6..3249c2741491 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_pool.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
>>>> @@ -113,7 +113,17 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
>>>>  
>>>>  void rxe_pool_cleanup(struct rxe_pool *pool)
>>>>  {
>>>> -	WARN_ON(!xa_empty(&pool->xa));
>>>> +	unsigned long index;
>>>> +	struct rxe_pool_elem *elem;
>>>> +
>>>> +	xa_lock(&pool->xa);
>>>> +	xa_for_each(&pool->xa, index, elem) {
>>>> +		rxe_err_dev(pool->rxe, "%s#%d: Leaked", pool->name,
>>>> +				elem->index);
>>>> +	}
>>>> +	xa_unlock(&pool->xa);
>>>> +
>>>> +	xa_destroy(&pool->xa);
>>>>  }
>>>
>>> Is this why? Just count the number of unfinalized objects and report
>>> if there is difference, don't mess up the xarray.
>>>
>>> Jason
>> This is why I made the last change but I really didn't like that there was no
>> way to lookup the qp from its index since we were using a NULL xarray entry to
>> indicate the state of the qp. Making it explicit, i.e. a variable in the struct
>> seems much more straight forward. Not sure why you hated the last
>> change.
> 
> Because if you don't call finalize abort has to be deterministic, and
> abort can't be that if it someone else can access the poitner and, eg,
> take a reference.

rxe_pool_get_index() is the only 'correct' way to look up the pointer and
it checks the valid state (now). Scanning the xarray or just looking up
the qp is something outside the scope of the normal flows. Like listing
orphan objects on module exit.

Memory ownership didn't change. It is still the same. The only change is
how we mark whether the object is valid for lookup.

Bob
> 
> It breaks the entire model of how the memory ownership works during creation.
> 
> Jason
Jason Gunthorpe July 31, 2023, 6:43 p.m. UTC | #5
On Mon, Jul 31, 2023 at 01:42:09PM -0500, Bob Pearson wrote:
> On 7/31/23 13:31, Jason Gunthorpe wrote:
> > On Mon, Jul 31, 2023 at 01:23:59PM -0500, Bob Pearson wrote:
> >> On 7/31/23 13:15, Jason Gunthorpe wrote:
> >>> On Fri, Jul 21, 2023 at 03:50:21PM -0500, Bob Pearson wrote:
> >>>> This patch gives a more detailed list of objects that are not
> >>>> freed in case of error before the module exits.
> >>>>
> >>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> >>>> ---
> >>>>  drivers/infiniband/sw/rxe/rxe_pool.c | 12 +++++++++++-
> >>>>  1 file changed, 11 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
> >>>> index cb812bd969c6..3249c2741491 100644
> >>>> --- a/drivers/infiniband/sw/rxe/rxe_pool.c
> >>>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
> >>>> @@ -113,7 +113,17 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
> >>>>  
> >>>>  void rxe_pool_cleanup(struct rxe_pool *pool)
> >>>>  {
> >>>> -	WARN_ON(!xa_empty(&pool->xa));
> >>>> +	unsigned long index;
> >>>> +	struct rxe_pool_elem *elem;
> >>>> +
> >>>> +	xa_lock(&pool->xa);
> >>>> +	xa_for_each(&pool->xa, index, elem) {
> >>>> +		rxe_err_dev(pool->rxe, "%s#%d: Leaked", pool->name,
> >>>> +				elem->index);
> >>>> +	}
> >>>> +	xa_unlock(&pool->xa);
> >>>> +
> >>>> +	xa_destroy(&pool->xa);
> >>>>  }
> >>>
> >>> Is this why? Just count the number of unfinalized objects and report
> >>> if there is difference, don't mess up the xarray.
> >>>
> >>> Jason
> >> This is why I made the last change but I really didn't like that there was no
> >> way to lookup the qp from its index since we were using a NULL xarray entry to
> >> indicate the state of the qp. Making it explicit, i.e. a variable in the struct
> >> seems much more straight forward. Not sure why you hated the last
> >> change.
> > 
> > Because if you don't call finalize abort has to be deterministic, and
> > abort can't be that if it someone else can access the poitner and, eg,
> > take a reference.
> 
> rxe_pool_get_index() is the only 'correct' way to look up the pointer and
> it checks the valid state (now). Scanning the xarray or just looking up
> the qp is something outside the scope of the normal flows. Like listing
> orphan objects on module exit.

Maybe at this instance, but people keep changing this and it is
fundamentally wrong to store a pointer to an incompletely initialized
memory for other threads to see.

Especially for some minor debugging feature.

Jason
Bob Pearson July 31, 2023, 6:51 p.m. UTC | #6
On 7/31/23 13:43, Jason Gunthorpe wrote:
> On Mon, Jul 31, 2023 at 01:42:09PM -0500, Bob Pearson wrote:
>> On 7/31/23 13:31, Jason Gunthorpe wrote:
>>> On Mon, Jul 31, 2023 at 01:23:59PM -0500, Bob Pearson wrote:
>>>> On 7/31/23 13:15, Jason Gunthorpe wrote:
>>>>> On Fri, Jul 21, 2023 at 03:50:21PM -0500, Bob Pearson wrote:
>>>>>> This patch gives a more detailed list of objects that are not
>>>>>> freed in case of error before the module exits.
>>>>>>
>>>>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>>>>>> ---
>>>>>>  drivers/infiniband/sw/rxe/rxe_pool.c | 12 +++++++++++-
>>>>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
>>>>>> index cb812bd969c6..3249c2741491 100644
>>>>>> --- a/drivers/infiniband/sw/rxe/rxe_pool.c
>>>>>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
>>>>>> @@ -113,7 +113,17 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
>>>>>>  
>>>>>>  void rxe_pool_cleanup(struct rxe_pool *pool)
>>>>>>  {
>>>>>> -	WARN_ON(!xa_empty(&pool->xa));
>>>>>> +	unsigned long index;
>>>>>> +	struct rxe_pool_elem *elem;
>>>>>> +
>>>>>> +	xa_lock(&pool->xa);
>>>>>> +	xa_for_each(&pool->xa, index, elem) {
>>>>>> +		rxe_err_dev(pool->rxe, "%s#%d: Leaked", pool->name,
>>>>>> +				elem->index);
>>>>>> +	}
>>>>>> +	xa_unlock(&pool->xa);
>>>>>> +
>>>>>> +	xa_destroy(&pool->xa);
>>>>>>  }
>>>>>
>>>>> Is this why? Just count the number of unfinalized objects and report
>>>>> if there is difference, don't mess up the xarray.
>>>>>
>>>>> Jason
>>>> This is why I made the last change but I really didn't like that there was no
>>>> way to lookup the qp from its index since we were using a NULL xarray entry to
>>>> indicate the state of the qp. Making it explicit, i.e. a variable in the struct
>>>> seems much more straight forward. Not sure why you hated the last
>>>> change.
>>>
>>> Because if you don't call finalize abort has to be deterministic, and
>>> abort can't be that if it someone else can access the poitner and, eg,
>>> take a reference.
>>
>> rxe_pool_get_index() is the only 'correct' way to look up the pointer and
>> it checks the valid state (now). Scanning the xarray or just looking up
>> the qp is something outside the scope of the normal flows. Like listing
>> orphan objects on module exit.
> 
> Maybe at this instance, but people keep changing this and it is
> fundamentally wrong to store a pointer to an incompletely initialized
> memory for other threads to see.
> 
> Especially for some minor debugging feature.

Maybe warning comments would help. There are lots of ways to break the code, sigh.
The typical one is someone makes a change that breaks reference counting.
> 
> Jason
Jason Gunthorpe Aug. 4, 2023, 2:16 p.m. UTC | #7
On Mon, Jul 31, 2023 at 01:51:59PM -0500, Bob Pearson wrote:
> On 7/31/23 13:43, Jason Gunthorpe wrote:
> > On Mon, Jul 31, 2023 at 01:42:09PM -0500, Bob Pearson wrote:
> >> On 7/31/23 13:31, Jason Gunthorpe wrote:
> >>> On Mon, Jul 31, 2023 at 01:23:59PM -0500, Bob Pearson wrote:
> >>>> On 7/31/23 13:15, Jason Gunthorpe wrote:
> >>>>> On Fri, Jul 21, 2023 at 03:50:21PM -0500, Bob Pearson wrote:
> >>>>>> This patch gives a more detailed list of objects that are not
> >>>>>> freed in case of error before the module exits.
> >>>>>>
> >>>>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> >>>>>> ---
> >>>>>>  drivers/infiniband/sw/rxe/rxe_pool.c | 12 +++++++++++-
> >>>>>>  1 file changed, 11 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
> >>>>>> index cb812bd969c6..3249c2741491 100644
> >>>>>> --- a/drivers/infiniband/sw/rxe/rxe_pool.c
> >>>>>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
> >>>>>> @@ -113,7 +113,17 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
> >>>>>>  
> >>>>>>  void rxe_pool_cleanup(struct rxe_pool *pool)
> >>>>>>  {
> >>>>>> -	WARN_ON(!xa_empty(&pool->xa));
> >>>>>> +	unsigned long index;
> >>>>>> +	struct rxe_pool_elem *elem;
> >>>>>> +
> >>>>>> +	xa_lock(&pool->xa);
> >>>>>> +	xa_for_each(&pool->xa, index, elem) {
> >>>>>> +		rxe_err_dev(pool->rxe, "%s#%d: Leaked", pool->name,
> >>>>>> +				elem->index);
> >>>>>> +	}
> >>>>>> +	xa_unlock(&pool->xa);
> >>>>>> +
> >>>>>> +	xa_destroy(&pool->xa);
> >>>>>>  }
> >>>>>
> >>>>> Is this why? Just count the number of unfinalized objects and report
> >>>>> if there is difference, don't mess up the xarray.
> >>>>>
> >>>>> Jason
> >>>> This is why I made the last change but I really didn't like that there was no
> >>>> way to lookup the qp from its index since we were using a NULL xarray entry to
> >>>> indicate the state of the qp. Making it explicit, i.e. a variable in the struct
> >>>> seems much more straight forward. Not sure why you hated the last
> >>>> change.
> >>>
> >>> Because if you don't call finalize abort has to be deterministic, and
> >>> abort can't be that if it someone else can access the poitner and, eg,
> >>> take a reference.
> >>
> >> rxe_pool_get_index() is the only 'correct' way to look up the pointer and
> >> it checks the valid state (now). Scanning the xarray or just looking up
> >> the qp is something outside the scope of the normal flows. Like listing
> >> orphan objects on module exit.
> > 
> > Maybe at this instance, but people keep changing this and it is
> > fundamentally wrong to store a pointer to an incompletely initialized
> > memory for other threads to see.
> > 
> > Especially for some minor debugging feature.
> 
> Maybe warning comments would help. There are lots of ways to break the code, sigh.
> The typical one is someone makes a change that breaks reference
> counting.

Don't do it wrong in the first place is the most important thing.

Don't put incompletely intialized objects in global memory. It is a
very simple rule.

Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index cb812bd969c6..3249c2741491 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -113,7 +113,17 @@  void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
 
 void rxe_pool_cleanup(struct rxe_pool *pool)
 {
-	WARN_ON(!xa_empty(&pool->xa));
+	unsigned long index;
+	struct rxe_pool_elem *elem;
+
+	xa_lock(&pool->xa);
+	xa_for_each(&pool->xa, index, elem) {
+		rxe_err_dev(pool->rxe, "%s#%d: Leaked", pool->name,
+				elem->index);
+	}
+	xa_unlock(&pool->xa);
+
+	xa_destroy(&pool->xa);
 }
 
 int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem,