[for-next,v8,00/10] RDMA/rxe: Implement memory windows

Message ID	20210525213751.629017-1-rpearsonhpe@gmail.com (mailing list archive)
Headers	show Return-Path: <linux-rdma-owner@kernel.org> From: Bob Pearson <rpearsonhpe@gmail.com> To: jgg@nvidia.com, zyj2000@gmail.com, linux-rdma@vger.kernel.org Cc: Bob Pearson <rpearsonhpe@gmail.com> Subject: [PATCH for-next v8 00/10] RDMA/rxe: Implement memory windows Date: Tue, 25 May 2021 16:37:42 -0500 Message-Id: <20210525213751.629017-1-rpearsonhpe@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	RDMA/rxe: Implement memory windows \| expand [for-next,v8,00/10] RDMA/rxe: Implement memory windows [for-next,v8,1/9] RDMA/rxe: Add bind MW fields to rxe_send_wr [for-next,v8,02/10] RDMA/rxe: Return errors for add index and key [for-next,v8,03/10] RDMA/rxe: Enable MW object pool [for-next,v8,04/10] RDMA/rxe: Add ib_alloc_mw and ib_dealloc_mw verbs [for-next,v8,05/10] RDMA/rxe: Replace WR_REG_MASK by WR_LOCAL_OP_MASK [for-next,v8,06/10] RDMA/rxe: Move local ops to subroutine [for-next,v8,07/10] RDMA/rxe: Add support for bind MW work requests [for-next,v8,08/10] RDMA/rxe: Implement invalidate MW operations [for-next,v8,09/10] RDMA/rxe: Implement memory access through MWs [for-next,v8,10/10] RDMA/rxe: Disallow MR dereg and invalidate when bound

Bob Pearson May 25, 2021, 9:37 p.m. UTC

This series of patches implement memory windows for the rdma_rxe
driver. This is a shorter reimplementation of an earlier patch set.
They apply to and depend on the current for-next linux rdma tree.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
v8:
  Dropped wr.mw.flags in the rxe_send_wr struct in rdma_user_rxe.h.
v7:
  Fixed a duplicate INIT_RDMA_OBJ_SIZE(ib_mw, ...) in rxe_verbs.c.
v6:
  Added rxe_ prefix to subroutine names in lines that changed
  from Zhu's review of v5.
v5:
  Fixed a typo in 10th patch.
v4:
  Added a 10th patch to check when MRs have bound MWs
  and disallow dereg and invalidate operations.
v3:
  cleaned up void return and lower case enums from
  Zhu's review.
v2:
  cleaned up an issue in rdma_user_rxe.h
  cleaned up a collision in rxe_resp.c

Bob Pearson (10):
  RDMA/rxe: Add bind MW fields to rxe_send_wr
  RDMA/rxe: Return errors for add index and key
  RDMA/rxe: Enable MW object pool
  RDMA/rxe: Add ib_alloc_mw and ib_dealloc_mw verbs
  RDMA/rxe: Replace WR_REG_MASK by WR_LOCAL_OP_MASK
  RDMA/rxe: Move local ops to subroutine
  RDMA/rxe: Add support for bind MW work requests
  RDMA/rxe: Implement invalidate MW operations
  RDMA/rxe: Implement memory access through MWs
  RDMA/rxe: Disallow MR dereg and invalidate when bound

 drivers/infiniband/sw/rxe/Makefile     |   1 +
 drivers/infiniband/sw/rxe/rxe.c        |   1 +
 drivers/infiniband/sw/rxe/rxe_comp.c   |   5 +-
 drivers/infiniband/sw/rxe/rxe_loc.h    |  36 +--
 drivers/infiniband/sw/rxe/rxe_mr.c     | 126 ++++++---
 drivers/infiniband/sw/rxe/rxe_mw.c     | 343 +++++++++++++++++++++++++
 drivers/infiniband/sw/rxe/rxe_opcode.c |  11 +-
 drivers/infiniband/sw/rxe/rxe_opcode.h |   3 +-
 drivers/infiniband/sw/rxe/rxe_param.h  |  19 +-
 drivers/infiniband/sw/rxe/rxe_pool.c   |  45 ++--
 drivers/infiniband/sw/rxe/rxe_pool.h   |   8 +-
 drivers/infiniband/sw/rxe/rxe_req.c    | 104 +++++---
 drivers/infiniband/sw/rxe/rxe_resp.c   | 111 +++++---
 drivers/infiniband/sw/rxe/rxe_verbs.c  |  15 +-
 drivers/infiniband/sw/rxe/rxe_verbs.h  |  48 +++-
 include/uapi/rdma/rdma_user_rxe.h      |  10 +
 16 files changed, 702 insertions(+), 184 deletions(-)
 create mode 100644 drivers/infiniband/sw/rxe/rxe_mw.c

Jason Gunthorpe June 3, 2021, 6:58 p.m. UTC | #1

On Tue, May 25, 2021 at 04:37:42PM -0500, Bob Pearson wrote:
> This series of patches implement memory windows for the rdma_rxe
> driver. This is a shorter reimplementation of an earlier patch set.
> They apply to and depend on the current for-next linux rdma tree.
> 
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
> v8:
>   Dropped wr.mw.flags in the rxe_send_wr struct in rdma_user_rxe.h.
> v7:
>   Fixed a duplicate INIT_RDMA_OBJ_SIZE(ib_mw, ...) in rxe_verbs.c.
> v6:
>   Added rxe_ prefix to subroutine names in lines that changed
>   from Zhu's review of v5.
> v5:
>   Fixed a typo in 10th patch.
> v4:
>   Added a 10th patch to check when MRs have bound MWs
>   and disallow dereg and invalidate operations.
> v3:
>   cleaned up void return and lower case enums from
>   Zhu's review.
> v2:
>   cleaned up an issue in rdma_user_rxe.h
>   cleaned up a collision in rxe_resp.c
> 
> Bob Pearson (10):
>   RDMA/rxe: Add bind MW fields to rxe_send_wr
>   RDMA/rxe: Return errors for add index and key
>   RDMA/rxe: Enable MW object pool
>   RDMA/rxe: Add ib_alloc_mw and ib_dealloc_mw verbs
>   RDMA/rxe: Replace WR_REG_MASK by WR_LOCAL_OP_MASK
>   RDMA/rxe: Move local ops to subroutine
>   RDMA/rxe: Add support for bind MW work requests
>   RDMA/rxe: Implement invalidate MW operations
>   RDMA/rxe: Implement memory access through MWs
>   RDMA/rxe: Disallow MR dereg and invalidate when bound

This is all ready now, right?

Can you put the userspace part on the github please?

Jason

Bob Pearson June 4, 2021, 3:22 a.m. UTC | #2

On 6/3/2021 1:58 PM, Jason Gunthorpe wrote:
> On Tue, May 25, 2021 at 04:37:42PM -0500, Bob Pearson wrote:
>> This series of patches implement memory windows for the rdma_rxe
>> driver. This is a shorter reimplementation of an earlier patch set.
>> They apply to and depend on the current for-next linux rdma tree.
>>
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>> ---
>> v8:
>>    Dropped wr.mw.flags in the rxe_send_wr struct in rdma_user_rxe.h.
>> v7:
>>    Fixed a duplicate INIT_RDMA_OBJ_SIZE(ib_mw, ...) in rxe_verbs.c.
>> v6:
>>    Added rxe_ prefix to subroutine names in lines that changed
>>    from Zhu's review of v5.
>> v5:
>>    Fixed a typo in 10th patch.
>> v4:
>>    Added a 10th patch to check when MRs have bound MWs
>>    and disallow dereg and invalidate operations.
>> v3:
>>    cleaned up void return and lower case enums from
>>    Zhu's review.
>> v2:
>>    cleaned up an issue in rdma_user_rxe.h
>>    cleaned up a collision in rxe_resp.c
>>
>> Bob Pearson (10):
>>    RDMA/rxe: Add bind MW fields to rxe_send_wr
>>    RDMA/rxe: Return errors for add index and key
>>    RDMA/rxe: Enable MW object pool
>>    RDMA/rxe: Add ib_alloc_mw and ib_dealloc_mw verbs
>>    RDMA/rxe: Replace WR_REG_MASK by WR_LOCAL_OP_MASK
>>    RDMA/rxe: Move local ops to subroutine
>>    RDMA/rxe: Add support for bind MW work requests
>>    RDMA/rxe: Implement invalidate MW operations
>>    RDMA/rxe: Implement memory access through MWs
>>    RDMA/rxe: Disallow MR dereg and invalidate when bound
> This is all ready now, right?
>
> Can you put the userspace part on the github please?
>
> Jason

I think so. The user space change is in my tree at github but I have to 
admit I forgot how to submit the pull requests.

I think you sent me instructions a while back but I can't find them.

Bob

Zhu Yanjun June 4, 2021, 5:37 a.m. UTC | #3

"
[ 1041.051398] rdma_rxe: loaded
[ 1041.054536] infiniband rxe0: set active
[ 1041.054540] infiniband rxe0: added enp0s8
[ 1086.287975] rdma_rxe: cqe(32768) > max_cqe(32767)
[ 1086.311546] rdma_rxe: cqe(1) < current # elements in queue (6)
[ 1086.399826] rdma_rxe: cqe(32768) > max_cqe(32767)
[ 1090.232785] rdma_rxe: cqe(32768) > max_cqe(32767)
[ 1090.255985] rdma_rxe: cqe(1) < current # elements in queue (6)
[ 1090.345427] rdma_rxe: cqe(32768) > max_cqe(32767)
[ 1094.024322] rdma_rxe: cqe(32768) > max_cqe(32767)
[ 1094.047569] rdma_rxe: cqe(1) < current # elements in queue (6)
[ 1094.136103] rdma_rxe: cqe(32768) > max_cqe(32767)
[ 1098.989344] rdma_rxe: cqe(32768) > max_cqe(32767)
[ 1099.015065] rdma_rxe: cqe(1) < current # elements in queue (6)
[ 1099.112970] rdma_rxe: cqe(32768) > max_cqe(32767)
[ 1103.040076] rdma_rxe: cqe(32768) > max_cqe(32767)
[ 1103.062831] rdma_rxe: cqe(1) < current # elements in queue (6)
[ 1103.151157] rdma_rxe: cqe(32768) > max_cqe(32767)
[ 1116.121772] rdma_rxe: cqe(32768) > max_cqe(32767)
[ 1116.144951] rdma_rxe: cqe(1) < current # elements in queue (6)
[ 1116.234607] rdma_rxe: cqe(32768) > max_cqe(32767)
[ 1131.655486] rdma_rxe: cqe(32768) > max_cqe(32767)
[ 1131.678700] rdma_rxe: cqe(1) < current # elements in queue (6)
[ 1131.766776] rdma_rxe: cqe(32768) > max_cqe(32767)
[ 1175.517166] rdma_rxe: cqe(32768) > max_cqe(32767)
[ 1175.540258] rdma_rxe: cqe(1) < current # elements in queue (6)
[ 1175.628214] rdma_rxe: cqe(32768) > max_cqe(32767)
[ 1190.760122] rdma_rxe: cqe(32768) > max_cqe(32767)
[ 1190.783243] rdma_rxe: cqe(1) < current # elements in queue (6)
[ 1190.871167] rdma_rxe: cqe(32768) > max_cqe(32767)
[ 1249.651921] rdma_rxe: rxe-pd pool destroyed with unfree'd elem
[ 1249.651927] rdma_rxe: rxe-qp pool destroyed with unfree'd elem
[ 1249.651929] rdma_rxe: rxe-cq pool destroyed with unfree'd elem
[ 1255.227916] rdma_rxe: unloaded
"

After I added a rxe device on the netdev, then run rdma-core test tools.
Then I remove rxe device, in the end, I unloaded rdma_rxe kernel modules.
I found the above logs.
"
[ 1249.651921] rdma_rxe: rxe-pd pool destroyed with unfree'd elem
[ 1249.651927] rdma_rxe: rxe-qp pool destroyed with unfree'd elem
[ 1249.651929] rdma_rxe: rxe-cq pool destroyed with unfree'd elem
"

It seems that  some resources leak.

I will make further investigations.

Zhu Yanjun

On Fri, Jun 4, 2021 at 2:58 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, May 25, 2021 at 04:37:42PM -0500, Bob Pearson wrote:
> > This series of patches implement memory windows for the rdma_rxe
> > driver. This is a shorter reimplementation of an earlier patch set.
> > They apply to and depend on the current for-next linux rdma tree.
> >
> > Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> > ---
> > v8:
> >   Dropped wr.mw.flags in the rxe_send_wr struct in rdma_user_rxe.h.
> > v7:
> >   Fixed a duplicate INIT_RDMA_OBJ_SIZE(ib_mw, ...) in rxe_verbs.c.
> > v6:
> >   Added rxe_ prefix to subroutine names in lines that changed
> >   from Zhu's review of v5.
> > v5:
> >   Fixed a typo in 10th patch.
> > v4:
> >   Added a 10th patch to check when MRs have bound MWs
> >   and disallow dereg and invalidate operations.
> > v3:
> >   cleaned up void return and lower case enums from
> >   Zhu's review.
> > v2:
> >   cleaned up an issue in rdma_user_rxe.h
> >   cleaned up a collision in rxe_resp.c
> >
> > Bob Pearson (10):
> >   RDMA/rxe: Add bind MW fields to rxe_send_wr
> >   RDMA/rxe: Return errors for add index and key
> >   RDMA/rxe: Enable MW object pool
> >   RDMA/rxe: Add ib_alloc_mw and ib_dealloc_mw verbs
> >   RDMA/rxe: Replace WR_REG_MASK by WR_LOCAL_OP_MASK
> >   RDMA/rxe: Move local ops to subroutine
> >   RDMA/rxe: Add support for bind MW work requests
> >   RDMA/rxe: Implement invalidate MW operations
> >   RDMA/rxe: Implement memory access through MWs
> >   RDMA/rxe: Disallow MR dereg and invalidate when bound
>
> This is all ready now, right?
>
> Can you put the userspace part on the github please?
>
> Jason

Pearson, Robert B June 4, 2021, 3:25 p.m. UTC | #4

There is probably a reference to the PD hanging around. I'll look. -- Bob

-----Original Message-----
From: Zhu Yanjun <zyjzyj2000@gmail.com> 
Sent: Friday, June 4, 2021 12:38 AM
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Bob Pearson <rpearsonhpe@gmail.com>; zyj2000@gmail.com; RDMA mailing list <linux-rdma@vger.kernel.org>
Subject: Re: [PATCH for-next v8 00/10] RDMA/rxe: Implement memory windows

"
[ 1041.051398] rdma_rxe: loaded
[ 1041.054536] infiniband rxe0: set active [ 1041.054540] infiniband rxe0: added enp0s8 [ 1086.287975] rdma_rxe: cqe(32768) > max_cqe(32767) [ 1086.311546] rdma_rxe: cqe(1) < current # elements in queue (6) [ 1086.399826] rdma_rxe: cqe(32768) > max_cqe(32767) [ 1090.232785] rdma_rxe: cqe(32768) > max_cqe(32767) [ 1090.255985] rdma_rxe: cqe(1) < current # elements in queue (6) [ 1090.345427] rdma_rxe: cqe(32768) > max_cqe(32767) [ 1094.024322] rdma_rxe: cqe(32768) > max_cqe(32767) [ 1094.047569] rdma_rxe: cqe(1) < current # elements in queue (6) [ 1094.136103] rdma_rxe: cqe(32768) > max_cqe(32767) [ 1098.989344] rdma_rxe: cqe(32768) > max_cqe(32767) [ 1099.015065] rdma_rxe: cqe(1) < current # elements in queue (6) [ 1099.112970] rdma_rxe: cqe(32768) > max_cqe(32767) [ 1103.040076] rdma_rxe: cqe(32768) > max_cqe(32767) [ 1103.062831] rdma_rxe: cqe(1) < current # elements in queue (6) [ 1103.151157] rdma_rxe: cqe(32768) > max_cqe(32767) [ 1116.121772] rdma_rxe: cqe(32768) > max_cqe(32767) [ 1116.144951] rdma_rxe: cqe(1) < current # elements in queue (6) [ 1116.234607] rdma_rxe: cqe(32768) > max_cqe(32767) [ 1131.655486] rdma_rxe: cqe(32768) > max_cqe(32767) [ 1131.678700] rdma_rxe: cqe(1) < current # elements in queue (6) [ 1131.766776] rdma_rxe: cqe(32768) > max_cqe(32767) [ 1175.517166] rdma_rxe: cqe(32768) > max_cqe(32767) [ 1175.540258] rdma_rxe: cqe(1) < current # elements in queue (6) [ 1175.628214] rdma_rxe: cqe(32768) > max_cqe(32767) [ 1190.760122] rdma_rxe: cqe(32768) > max_cqe(32767) [ 1190.783243] rdma_rxe: cqe(1) < current # elements in queue (6) [ 1190.871167] rdma_rxe: cqe(32768) > max_cqe(32767) [ 1249.651921] rdma_rxe: rxe-pd pool destroyed with unfree'd elem [ 1249.651927] rdma_rxe: rxe-qp pool destroyed with unfree'd elem [ 1249.651929] rdma_rxe: rxe-cq pool destroyed with unfree'd elem [ 1255.227916] rdma_rxe: unloaded "

After I added a rxe device on the netdev, then run rdma-core test tools.
Then I remove rxe device, in the end, I unloaded rdma_rxe kernel modules.
I found the above logs.
"
[ 1249.651921] rdma_rxe: rxe-pd pool destroyed with unfree'd elem [ 1249.651927] rdma_rxe: rxe-qp pool destroyed with unfree'd elem [ 1249.651929] rdma_rxe: rxe-cq pool destroyed with unfree'd elem "

It seems that  some resources leak.

I will make further investigations.

Zhu Yanjun

On Fri, Jun 4, 2021 at 2:58 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, May 25, 2021 at 04:37:42PM -0500, Bob Pearson wrote:
> > This series of patches implement memory windows for the rdma_rxe 
> > driver. This is a shorter reimplementation of an earlier patch set.
> > They apply to and depend on the current for-next linux rdma tree.
> >
> > Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> > ---
> > v8:
> >   Dropped wr.mw.flags in the rxe_send_wr struct in rdma_user_rxe.h.
> > v7:
> >   Fixed a duplicate INIT_RDMA_OBJ_SIZE(ib_mw, ...) in rxe_verbs.c.
> > v6:
> >   Added rxe_ prefix to subroutine names in lines that changed
> >   from Zhu's review of v5.
> > v5:
> >   Fixed a typo in 10th patch.
> > v4:
> >   Added a 10th patch to check when MRs have bound MWs
> >   and disallow dereg and invalidate operations.
> > v3:
> >   cleaned up void return and lower case enums from
> >   Zhu's review.
> > v2:
> >   cleaned up an issue in rdma_user_rxe.h
> >   cleaned up a collision in rxe_resp.c
> >
> > Bob Pearson (10):
> >   RDMA/rxe: Add bind MW fields to rxe_send_wr
> >   RDMA/rxe: Return errors for add index and key
> >   RDMA/rxe: Enable MW object pool
> >   RDMA/rxe: Add ib_alloc_mw and ib_dealloc_mw verbs
> >   RDMA/rxe: Replace WR_REG_MASK by WR_LOCAL_OP_MASK
> >   RDMA/rxe: Move local ops to subroutine
> >   RDMA/rxe: Add support for bind MW work requests
> >   RDMA/rxe: Implement invalidate MW operations
> >   RDMA/rxe: Implement memory access through MWs
> >   RDMA/rxe: Disallow MR dereg and invalidate when bound
>
> This is all ready now, right?
>
> Can you put the userspace part on the github please?
>
> Jason

Bob Pearson June 4, 2021, 4:22 p.m. UTC | #5

On 6/4/2021 12:37 AM, Zhu Yanjun wrote:
>
> After I added a rxe device on the netdev, then run rdma-core test tools.
> Then I remove rxe device, in the end, I unloaded rdma_rxe kernel modules.
> I found the above logs.
> "
> [ 1249.651921] rdma_rxe: rxe-pd pool destroyed with unfree'd elem
> [ 1249.651927] rdma_rxe: rxe-qp pool destroyed with unfree'd elem
> [ 1249.651929] rdma_rxe: rxe-cq pool destroyed with unfree'd elem
> "
>
> It seems that  some resources leak.
>
> I will make further investigations.
>
> Zhu Yanjun

Zhu,

I suspect this is an older error. I traced all the add and drop ref 
calls for PDs, then ran the full suite of Python tests and also test_mr 
which includes the memory window tests by itself and then counted the 
adds and drops. For test_mr alone I get 85 adds and 85 drops but when I 
run the whole suite I get 384 adds and 380 drops. Since the memory 
window code is only exercised in test_mr I think it is OK. Somewhere 
else there are missing drops. I will try to isolate them.

Bob

Bob Pearson June 4, 2021, 5:53 p.m. UTC | #6

On 6/4/2021 11:22 AM, Pearson, Robert B wrote:
>
> On 6/4/2021 12:37 AM, Zhu Yanjun wrote:
>>
>> After I added a rxe device on the netdev, then run rdma-core test tools.
>> Then I remove rxe device, in the end, I unloaded rdma_rxe kernel 
>> modules.
>> I found the above logs.
>> "
>> [ 1249.651921] rdma_rxe: rxe-pd pool destroyed with unfree'd elem
>> [ 1249.651927] rdma_rxe: rxe-qp pool destroyed with unfree'd elem
>> [ 1249.651929] rdma_rxe: rxe-cq pool destroyed with unfree'd elem
>> "
>>
>> It seems that  some resources leak.
>>
>> I will make further investigations.
>>
>> Zhu Yanjun
>
> Zhu,
>
> I suspect this is an older error. I traced all the add and drop ref 
> calls for PDs, then ran the full suite of Python tests and also 
> test_mr which includes the memory window tests by itself and then 
> counted the adds and drops. For test_mr alone I get 85 adds and 85 
> drops but when I run the whole suite I get 384 adds and 380 drops. 
> Since the memory window code is only exercised in test_mr I think it 
> is OK. Somewhere else there are missing drops. I will try to isolate 
> them.
>
> Bob
>
Zhu,

In rdma_core/tests/test_qpex.py test_qp_ex_rc_atomic_cmp_swp and 
test_qp_ex_rc_atomic_fetch_add each have two missing drops of PDs. This 
is either a test bug or a bug in the rxe driver but it has nothing to do 
with the MW code. We should treat it as a separate error. For some 
reason these test cases are not cleaning up all resources.

The cleanup code in all these Python tests is very implicit. It just 
happens by magic so it is hard to figure out where an ibv_destroy_qp or 
ibv_destroy_cq went missing. It would help if someone who is familiar 
with these tests could look at it.

Bob

Jason Gunthorpe June 4, 2021, 5:55 p.m. UTC | #7

On Fri, Jun 04, 2021 at 12:53:51PM -0500, Pearson, Robert B wrote:
> 
> On 6/4/2021 11:22 AM, Pearson, Robert B wrote:
> > 
> > On 6/4/2021 12:37 AM, Zhu Yanjun wrote:
> > > 
> > > After I added a rxe device on the netdev, then run rdma-core test tools.
> > > Then I remove rxe device, in the end, I unloaded rdma_rxe kernel
> > > modules.
> > > I found the above logs.
> > > "
> > > [ 1249.651921] rdma_rxe: rxe-pd pool destroyed with unfree'd elem
> > > [ 1249.651927] rdma_rxe: rxe-qp pool destroyed with unfree'd elem
> > > [ 1249.651929] rdma_rxe: rxe-cq pool destroyed with unfree'd elem
> > > "
> > > 
> > > It seems that  some resources leak.
> > > 
> > > I will make further investigations.
> > > 
> > > Zhu Yanjun
> > 
> > Zhu,
> > 
> > I suspect this is an older error. I traced all the add and drop ref
> > calls for PDs, then ran the full suite of Python tests and also test_mr
> > which includes the memory window tests by itself and then counted the
> > adds and drops. For test_mr alone I get 85 adds and 85 drops but when I
> > run the whole suite I get 384 adds and 380 drops. Since the memory
> > window code is only exercised in test_mr I think it is OK. Somewhere
> > else there are missing drops. I will try to isolate them.
> > 
> > Bob
> > 
> Zhu,
> 
> In rdma_core/tests/test_qpex.py test_qp_ex_rc_atomic_cmp_swp and
> test_qp_ex_rc_atomic_fetch_add each have two missing drops of PDs. This is
> either a test bug or a bug in the rxe driver but it has nothing to do with
> the MW code. We should treat it as a separate error. For some reason these
> test cases are not cleaning up all resources.
> 
> The cleanup code in all these Python tests is very implicit. It just happens
> by magic so it is hard to figure out where an ibv_destroy_qp or
> ibv_destroy_cq went missing. It would help if someone who is familiar with
> these tests could look at it.

It is impossible for userspace to leak a kernel resource, when the fd
is closed everything is destroyed back to the driver guarenteed by the
kernel.

As long as pyverbs has exited pyverbs cannot be the bug

Jason

Bob Pearson June 4, 2021, 7:26 p.m. UTC | #8

On 6/4/2021 12:55 PM, Jason Gunthorpe wrote:
> On Fri, Jun 04, 2021 at 12:53:51PM -0500, Pearson, Robert B wrote:
>> On 6/4/2021 11:22 AM, Pearson, Robert B wrote:
>>> On 6/4/2021 12:37 AM, Zhu Yanjun wrote:
>>>> After I added a rxe device on the netdev, then run rdma-core test tools.
>>>> Then I remove rxe device, in the end, I unloaded rdma_rxe kernel
>>>> modules.
>>>> I found the above logs.
>>>> "
>>>> [ 1249.651921] rdma_rxe: rxe-pd pool destroyed with unfree'd elem
>>>> [ 1249.651927] rdma_rxe: rxe-qp pool destroyed with unfree'd elem
>>>> [ 1249.651929] rdma_rxe: rxe-cq pool destroyed with unfree'd elem
>>>> "
>>>>
>>>> It seems that  some resources leak.
>>>>
>>>> I will make further investigations.
>>>>
>>>> Zhu Yanjun
>>> Zhu,
>>>
>>> I suspect this is an older error. I traced all the add and drop ref
>>> calls for PDs, then ran the full suite of Python tests and also test_mr
>>> which includes the memory window tests by itself and then counted the
>>> adds and drops. For test_mr alone I get 85 adds and 85 drops but when I
>>> run the whole suite I get 384 adds and 380 drops. Since the memory
>>> window code is only exercised in test_mr I think it is OK. Somewhere
>>> else there are missing drops. I will try to isolate them.
>>>
>>> Bob
>>>
>> Zhu,
>>
>> In rdma_core/tests/test_qpex.py test_qp_ex_rc_atomic_cmp_swp and
>> test_qp_ex_rc_atomic_fetch_add each have two missing drops of PDs. This is
>> either a test bug or a bug in the rxe driver but it has nothing to do with
>> the MW code. We should treat it as a separate error. For some reason these
>> test cases are not cleaning up all resources.
>>
>> The cleanup code in all these Python tests is very implicit. It just happens
>> by magic so it is hard to figure out where an ibv_destroy_qp or
>> ibv_destroy_cq went missing. It would help if someone who is familiar with
>> these tests could look at it.
> It is impossible for userspace to leak a kernel resource, when the fd
> is closed everything is destroyed back to the driver guarenteed by the
> kernel.
>
> As long as pyverbs has exited pyverbs cannot be the bug
>
> Jason

Thanks. That helped. Adding traces for QP references turned up the 
problem. Someone took a reference on QP in send_atomic_ack() that is 
never matched by a drop reference. The logic was probably to protect the 
QP from going away while it held an atomic responder resource. The 
problem is that since the requester can retry the operation multiple 
times the responder never knows when to free the resource so it doesn't. 
It just recycles them FIFO when a new atomic request comes along. They 
are looked up by PSN. So the only logical solution is to not take the 
extra reference. If you destroy the QP while a requester is retrying 
atomic operations it just fails.

I'll submit a patch deleting one line.

Bob

[for-next,v8,00/10] RDMA/rxe: Implement memory windows

Message

Comments