Message ID | 20230721205021.5394-9-rpearsonhpe@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | RDMA/rxe: Misc fixes and cleanups | expand |
On Fri, Jul 21, 2023 at 03:50:21PM -0500, Bob Pearson wrote: > This patch gives a more detailed list of objects that are not > freed in case of error before the module exits. > > Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > --- > drivers/infiniband/sw/rxe/rxe_pool.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c > index cb812bd969c6..3249c2741491 100644 > --- a/drivers/infiniband/sw/rxe/rxe_pool.c > +++ b/drivers/infiniband/sw/rxe/rxe_pool.c > @@ -113,7 +113,17 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool, > > void rxe_pool_cleanup(struct rxe_pool *pool) > { > - WARN_ON(!xa_empty(&pool->xa)); > + unsigned long index; > + struct rxe_pool_elem *elem; > + > + xa_lock(&pool->xa); > + xa_for_each(&pool->xa, index, elem) { > + rxe_err_dev(pool->rxe, "%s#%d: Leaked", pool->name, > + elem->index); > + } > + xa_unlock(&pool->xa); > + > + xa_destroy(&pool->xa); > } Is this why? Just count the number of unfinalized objects and report if there is difference, don't mess up the xarray. Jason
On 7/31/23 13:15, Jason Gunthorpe wrote: > On Fri, Jul 21, 2023 at 03:50:21PM -0500, Bob Pearson wrote: >> This patch gives a more detailed list of objects that are not >> freed in case of error before the module exits. >> >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> >> --- >> drivers/infiniband/sw/rxe/rxe_pool.c | 12 +++++++++++- >> 1 file changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c >> index cb812bd969c6..3249c2741491 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_pool.c >> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c >> @@ -113,7 +113,17 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool, >> >> void rxe_pool_cleanup(struct rxe_pool *pool) >> { >> - WARN_ON(!xa_empty(&pool->xa)); >> + unsigned long index; >> + struct rxe_pool_elem *elem; >> + >> + xa_lock(&pool->xa); >> + xa_for_each(&pool->xa, index, elem) { >> + rxe_err_dev(pool->rxe, "%s#%d: Leaked", pool->name, >> + elem->index); >> + } >> + xa_unlock(&pool->xa); >> + >> + xa_destroy(&pool->xa); >> } > > Is this why? Just count the number of unfinalized objects and report > if there is difference, don't mess up the xarray. > > Jason This is why I made the last change but I really didn't like that there was no way to lookup the qp from its index since we were using a NULL xarray entry to indicate the state of the qp. Making it explicit, i.e. a variable in the struct seems much more straight forward. Not sure why you hated the last change. Bob
On Mon, Jul 31, 2023 at 01:23:59PM -0500, Bob Pearson wrote: > On 7/31/23 13:15, Jason Gunthorpe wrote: > > On Fri, Jul 21, 2023 at 03:50:21PM -0500, Bob Pearson wrote: > >> This patch gives a more detailed list of objects that are not > >> freed in case of error before the module exits. > >> > >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > >> --- > >> drivers/infiniband/sw/rxe/rxe_pool.c | 12 +++++++++++- > >> 1 file changed, 11 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c > >> index cb812bd969c6..3249c2741491 100644 > >> --- a/drivers/infiniband/sw/rxe/rxe_pool.c > >> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c > >> @@ -113,7 +113,17 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool, > >> > >> void rxe_pool_cleanup(struct rxe_pool *pool) > >> { > >> - WARN_ON(!xa_empty(&pool->xa)); > >> + unsigned long index; > >> + struct rxe_pool_elem *elem; > >> + > >> + xa_lock(&pool->xa); > >> + xa_for_each(&pool->xa, index, elem) { > >> + rxe_err_dev(pool->rxe, "%s#%d: Leaked", pool->name, > >> + elem->index); > >> + } > >> + xa_unlock(&pool->xa); > >> + > >> + xa_destroy(&pool->xa); > >> } > > > > Is this why? Just count the number of unfinalized objects and report > > if there is difference, don't mess up the xarray. > > > > Jason > This is why I made the last change but I really didn't like that there was no > way to lookup the qp from its index since we were using a NULL xarray entry to > indicate the state of the qp. Making it explicit, i.e. a variable in the struct > seems much more straight forward. Not sure why you hated the last > change. Because if you don't call finalize abort has to be deterministic, and abort can't be that if it someone else can access the poitner and, eg, take a reference. It breaks the entire model of how the memory ownership works during creation. Jason
On 7/31/23 13:31, Jason Gunthorpe wrote: > On Mon, Jul 31, 2023 at 01:23:59PM -0500, Bob Pearson wrote: >> On 7/31/23 13:15, Jason Gunthorpe wrote: >>> On Fri, Jul 21, 2023 at 03:50:21PM -0500, Bob Pearson wrote: >>>> This patch gives a more detailed list of objects that are not >>>> freed in case of error before the module exits. >>>> >>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> >>>> --- >>>> drivers/infiniband/sw/rxe/rxe_pool.c | 12 +++++++++++- >>>> 1 file changed, 11 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c >>>> index cb812bd969c6..3249c2741491 100644 >>>> --- a/drivers/infiniband/sw/rxe/rxe_pool.c >>>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c >>>> @@ -113,7 +113,17 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool, >>>> >>>> void rxe_pool_cleanup(struct rxe_pool *pool) >>>> { >>>> - WARN_ON(!xa_empty(&pool->xa)); >>>> + unsigned long index; >>>> + struct rxe_pool_elem *elem; >>>> + >>>> + xa_lock(&pool->xa); >>>> + xa_for_each(&pool->xa, index, elem) { >>>> + rxe_err_dev(pool->rxe, "%s#%d: Leaked", pool->name, >>>> + elem->index); >>>> + } >>>> + xa_unlock(&pool->xa); >>>> + >>>> + xa_destroy(&pool->xa); >>>> } >>> >>> Is this why? Just count the number of unfinalized objects and report >>> if there is difference, don't mess up the xarray. >>> >>> Jason >> This is why I made the last change but I really didn't like that there was no >> way to lookup the qp from its index since we were using a NULL xarray entry to >> indicate the state of the qp. Making it explicit, i.e. a variable in the struct >> seems much more straight forward. Not sure why you hated the last >> change. > > Because if you don't call finalize abort has to be deterministic, and > abort can't be that if it someone else can access the poitner and, eg, > take a reference. rxe_pool_get_index() is the only 'correct' way to look up the pointer and it checks the valid state (now). Scanning the xarray or just looking up the qp is something outside the scope of the normal flows. Like listing orphan objects on module exit. Memory ownership didn't change. It is still the same. The only change is how we mark whether the object is valid for lookup. Bob > > It breaks the entire model of how the memory ownership works during creation. > > Jason
On Mon, Jul 31, 2023 at 01:42:09PM -0500, Bob Pearson wrote: > On 7/31/23 13:31, Jason Gunthorpe wrote: > > On Mon, Jul 31, 2023 at 01:23:59PM -0500, Bob Pearson wrote: > >> On 7/31/23 13:15, Jason Gunthorpe wrote: > >>> On Fri, Jul 21, 2023 at 03:50:21PM -0500, Bob Pearson wrote: > >>>> This patch gives a more detailed list of objects that are not > >>>> freed in case of error before the module exits. > >>>> > >>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > >>>> --- > >>>> drivers/infiniband/sw/rxe/rxe_pool.c | 12 +++++++++++- > >>>> 1 file changed, 11 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c > >>>> index cb812bd969c6..3249c2741491 100644 > >>>> --- a/drivers/infiniband/sw/rxe/rxe_pool.c > >>>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c > >>>> @@ -113,7 +113,17 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool, > >>>> > >>>> void rxe_pool_cleanup(struct rxe_pool *pool) > >>>> { > >>>> - WARN_ON(!xa_empty(&pool->xa)); > >>>> + unsigned long index; > >>>> + struct rxe_pool_elem *elem; > >>>> + > >>>> + xa_lock(&pool->xa); > >>>> + xa_for_each(&pool->xa, index, elem) { > >>>> + rxe_err_dev(pool->rxe, "%s#%d: Leaked", pool->name, > >>>> + elem->index); > >>>> + } > >>>> + xa_unlock(&pool->xa); > >>>> + > >>>> + xa_destroy(&pool->xa); > >>>> } > >>> > >>> Is this why? Just count the number of unfinalized objects and report > >>> if there is difference, don't mess up the xarray. > >>> > >>> Jason > >> This is why I made the last change but I really didn't like that there was no > >> way to lookup the qp from its index since we were using a NULL xarray entry to > >> indicate the state of the qp. Making it explicit, i.e. a variable in the struct > >> seems much more straight forward. Not sure why you hated the last > >> change. > > > > Because if you don't call finalize abort has to be deterministic, and > > abort can't be that if it someone else can access the poitner and, eg, > > take a reference. > > rxe_pool_get_index() is the only 'correct' way to look up the pointer and > it checks the valid state (now). Scanning the xarray or just looking up > the qp is something outside the scope of the normal flows. Like listing > orphan objects on module exit. Maybe at this instance, but people keep changing this and it is fundamentally wrong to store a pointer to an incompletely initialized memory for other threads to see. Especially for some minor debugging feature. Jason
On 7/31/23 13:43, Jason Gunthorpe wrote: > On Mon, Jul 31, 2023 at 01:42:09PM -0500, Bob Pearson wrote: >> On 7/31/23 13:31, Jason Gunthorpe wrote: >>> On Mon, Jul 31, 2023 at 01:23:59PM -0500, Bob Pearson wrote: >>>> On 7/31/23 13:15, Jason Gunthorpe wrote: >>>>> On Fri, Jul 21, 2023 at 03:50:21PM -0500, Bob Pearson wrote: >>>>>> This patch gives a more detailed list of objects that are not >>>>>> freed in case of error before the module exits. >>>>>> >>>>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> >>>>>> --- >>>>>> drivers/infiniband/sw/rxe/rxe_pool.c | 12 +++++++++++- >>>>>> 1 file changed, 11 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c >>>>>> index cb812bd969c6..3249c2741491 100644 >>>>>> --- a/drivers/infiniband/sw/rxe/rxe_pool.c >>>>>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c >>>>>> @@ -113,7 +113,17 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool, >>>>>> >>>>>> void rxe_pool_cleanup(struct rxe_pool *pool) >>>>>> { >>>>>> - WARN_ON(!xa_empty(&pool->xa)); >>>>>> + unsigned long index; >>>>>> + struct rxe_pool_elem *elem; >>>>>> + >>>>>> + xa_lock(&pool->xa); >>>>>> + xa_for_each(&pool->xa, index, elem) { >>>>>> + rxe_err_dev(pool->rxe, "%s#%d: Leaked", pool->name, >>>>>> + elem->index); >>>>>> + } >>>>>> + xa_unlock(&pool->xa); >>>>>> + >>>>>> + xa_destroy(&pool->xa); >>>>>> } >>>>> >>>>> Is this why? Just count the number of unfinalized objects and report >>>>> if there is difference, don't mess up the xarray. >>>>> >>>>> Jason >>>> This is why I made the last change but I really didn't like that there was no >>>> way to lookup the qp from its index since we were using a NULL xarray entry to >>>> indicate the state of the qp. Making it explicit, i.e. a variable in the struct >>>> seems much more straight forward. Not sure why you hated the last >>>> change. >>> >>> Because if you don't call finalize abort has to be deterministic, and >>> abort can't be that if it someone else can access the poitner and, eg, >>> take a reference. >> >> rxe_pool_get_index() is the only 'correct' way to look up the pointer and >> it checks the valid state (now). Scanning the xarray or just looking up >> the qp is something outside the scope of the normal flows. Like listing >> orphan objects on module exit. > > Maybe at this instance, but people keep changing this and it is > fundamentally wrong to store a pointer to an incompletely initialized > memory for other threads to see. > > Especially for some minor debugging feature. Maybe warning comments would help. There are lots of ways to break the code, sigh. The typical one is someone makes a change that breaks reference counting. > > Jason
On Mon, Jul 31, 2023 at 01:51:59PM -0500, Bob Pearson wrote: > On 7/31/23 13:43, Jason Gunthorpe wrote: > > On Mon, Jul 31, 2023 at 01:42:09PM -0500, Bob Pearson wrote: > >> On 7/31/23 13:31, Jason Gunthorpe wrote: > >>> On Mon, Jul 31, 2023 at 01:23:59PM -0500, Bob Pearson wrote: > >>>> On 7/31/23 13:15, Jason Gunthorpe wrote: > >>>>> On Fri, Jul 21, 2023 at 03:50:21PM -0500, Bob Pearson wrote: > >>>>>> This patch gives a more detailed list of objects that are not > >>>>>> freed in case of error before the module exits. > >>>>>> > >>>>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > >>>>>> --- > >>>>>> drivers/infiniband/sw/rxe/rxe_pool.c | 12 +++++++++++- > >>>>>> 1 file changed, 11 insertions(+), 1 deletion(-) > >>>>>> > >>>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c > >>>>>> index cb812bd969c6..3249c2741491 100644 > >>>>>> --- a/drivers/infiniband/sw/rxe/rxe_pool.c > >>>>>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c > >>>>>> @@ -113,7 +113,17 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool, > >>>>>> > >>>>>> void rxe_pool_cleanup(struct rxe_pool *pool) > >>>>>> { > >>>>>> - WARN_ON(!xa_empty(&pool->xa)); > >>>>>> + unsigned long index; > >>>>>> + struct rxe_pool_elem *elem; > >>>>>> + > >>>>>> + xa_lock(&pool->xa); > >>>>>> + xa_for_each(&pool->xa, index, elem) { > >>>>>> + rxe_err_dev(pool->rxe, "%s#%d: Leaked", pool->name, > >>>>>> + elem->index); > >>>>>> + } > >>>>>> + xa_unlock(&pool->xa); > >>>>>> + > >>>>>> + xa_destroy(&pool->xa); > >>>>>> } > >>>>> > >>>>> Is this why? Just count the number of unfinalized objects and report > >>>>> if there is difference, don't mess up the xarray. > >>>>> > >>>>> Jason > >>>> This is why I made the last change but I really didn't like that there was no > >>>> way to lookup the qp from its index since we were using a NULL xarray entry to > >>>> indicate the state of the qp. Making it explicit, i.e. a variable in the struct > >>>> seems much more straight forward. Not sure why you hated the last > >>>> change. > >>> > >>> Because if you don't call finalize abort has to be deterministic, and > >>> abort can't be that if it someone else can access the poitner and, eg, > >>> take a reference. > >> > >> rxe_pool_get_index() is the only 'correct' way to look up the pointer and > >> it checks the valid state (now). Scanning the xarray or just looking up > >> the qp is something outside the scope of the normal flows. Like listing > >> orphan objects on module exit. > > > > Maybe at this instance, but people keep changing this and it is > > fundamentally wrong to store a pointer to an incompletely initialized > > memory for other threads to see. > > > > Especially for some minor debugging feature. > > Maybe warning comments would help. There are lots of ways to break the code, sigh. > The typical one is someone makes a change that breaks reference > counting. Don't do it wrong in the first place is the most important thing. Don't put incompletely intialized objects in global memory. It is a very simple rule. Jason
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c index cb812bd969c6..3249c2741491 100644 --- a/drivers/infiniband/sw/rxe/rxe_pool.c +++ b/drivers/infiniband/sw/rxe/rxe_pool.c @@ -113,7 +113,17 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool, void rxe_pool_cleanup(struct rxe_pool *pool) { - WARN_ON(!xa_empty(&pool->xa)); + unsigned long index; + struct rxe_pool_elem *elem; + + xa_lock(&pool->xa); + xa_for_each(&pool->xa, index, elem) { + rxe_err_dev(pool->rxe, "%s#%d: Leaked", pool->name, + elem->index); + } + xa_unlock(&pool->xa); + + xa_destroy(&pool->xa); } int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem,
This patch gives a more detailed list of objects that are not freed in case of error before the module exits. Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> --- drivers/infiniband/sw/rxe/rxe_pool.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)