Message ID | 20210521201824.659565-1-rpearsonhpe@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | RDMA/rxe: Implement memory windows | expand |
On Sat, May 22, 2021 at 4:19 AM Bob Pearson <rpearsonhpe@gmail.com> wrote: > > This series of patches implement memory windows for the rdma_rxe > driver. This is a shorter reimplementation of an earlier patch set. > They apply to and depend on the current for-next linux rdma tree. > > Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > --- > v7: > Fixed a duplicate INIT_RDMA_OBJ_SIZE(ib_mw, ...) in rxe_verbs.c. With this patch series, there are about 17 errors and 1 failure in rdma-core. " ---------------------------------------------------------------------- Ran 183 tests in 2.130s FAILED (failures=1, errors=17, skipped=124) " After these patches, not sure if rxe can communicate with the physical NICs correctly because of the above errors and failure. Zhu Yanjun > v6: > Added rxe_ prefix to subroutine names in lines that changed > from Zhu's review of v5. > v5: > Fixed a typo in 10th patch. > v4: > Added a 10th patch to check when MRs have bound MWs > and disallow dereg and invalidate operations. > v3: > cleaned up void return and lower case enums from > Zhu's review. > v2: > cleaned up an issue in rdma_user_rxe.h > cleaned up a collision in rxe_resp.c > > Bob Pearson (9): > RDMA/rxe: Add bind MW fields to rxe_send_wr > RDMA/rxe: Return errors for add index and key > RDMA/rxe: Enable MW object pool > RDMA/rxe: Add ib_alloc_mw and ib_dealloc_mw verbs > RDMA/rxe: Replace WR_REG_MASK by WR_LOCAL_OP_MASK > RDMA/rxe: Move local ops to subroutine > RDMA/rxe: Add support for bind MW work requests > RDMA/rxe: Implement invalidate MW operations > RDMA/rxe: Implement memory access through MWs > > drivers/infiniband/sw/rxe/Makefile | 1 + > drivers/infiniband/sw/rxe/rxe.c | 1 + > drivers/infiniband/sw/rxe/rxe_comp.c | 1 + > drivers/infiniband/sw/rxe/rxe_loc.h | 29 +- > drivers/infiniband/sw/rxe/rxe_mr.c | 79 ++++-- > drivers/infiniband/sw/rxe/rxe_mw.c | 356 +++++++++++++++++++++++++ > drivers/infiniband/sw/rxe/rxe_opcode.c | 11 +- > drivers/infiniband/sw/rxe/rxe_opcode.h | 3 +- > drivers/infiniband/sw/rxe/rxe_param.h | 19 +- > drivers/infiniband/sw/rxe/rxe_pool.c | 45 ++-- > drivers/infiniband/sw/rxe/rxe_pool.h | 8 +- > drivers/infiniband/sw/rxe/rxe_req.c | 102 ++++--- > drivers/infiniband/sw/rxe/rxe_resp.c | 110 +++++--- > drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- > drivers/infiniband/sw/rxe/rxe_verbs.h | 38 ++- > include/uapi/rdma/rdma_user_rxe.h | 34 ++- > 16 files changed, 691 insertions(+), 151 deletions(-) > create mode 100644 drivers/infiniband/sw/rxe/rxe_mw.c > -- > 2.27.0 >
On 5/23/2021 10:14 PM, Zhu Yanjun wrote: > On Sat, May 22, 2021 at 4:19 AM Bob Pearson <rpearsonhpe@gmail.com> wrote: >> This series of patches implement memory windows for the rdma_rxe >> driver. This is a shorter reimplementation of an earlier patch set. >> They apply to and depend on the current for-next linux rdma tree. >> >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> >> --- >> v7: >> Fixed a duplicate INIT_RDMA_OBJ_SIZE(ib_mw, ...) in rxe_verbs.c. > With this patch series, there are about 17 errors and 1 failure in rdma-core. Zhu, You have to sync the kernel-header file with the kernel. Bob > " > ---------------------------------------------------------------------- > Ran 183 tests in 2.130s > > FAILED (failures=1, errors=17, skipped=124) > " > > After these patches, not sure if rxe can communicate with the physical > NICs correctly because of the > above errors and failure. > > Zhu Yanjun > >> v6: >> Added rxe_ prefix to subroutine names in lines that changed >> from Zhu's review of v5. >> v5: >> Fixed a typo in 10th patch. >> v4: >> Added a 10th patch to check when MRs have bound MWs >> and disallow dereg and invalidate operations. >> v3: >> cleaned up void return and lower case enums from >> Zhu's review. >> v2: >> cleaned up an issue in rdma_user_rxe.h >> cleaned up a collision in rxe_resp.c >> >> Bob Pearson (9): >> RDMA/rxe: Add bind MW fields to rxe_send_wr >> RDMA/rxe: Return errors for add index and key >> RDMA/rxe: Enable MW object pool >> RDMA/rxe: Add ib_alloc_mw and ib_dealloc_mw verbs >> RDMA/rxe: Replace WR_REG_MASK by WR_LOCAL_OP_MASK >> RDMA/rxe: Move local ops to subroutine >> RDMA/rxe: Add support for bind MW work requests >> RDMA/rxe: Implement invalidate MW operations >> RDMA/rxe: Implement memory access through MWs >> >> drivers/infiniband/sw/rxe/Makefile | 1 + >> drivers/infiniband/sw/rxe/rxe.c | 1 + >> drivers/infiniband/sw/rxe/rxe_comp.c | 1 + >> drivers/infiniband/sw/rxe/rxe_loc.h | 29 +- >> drivers/infiniband/sw/rxe/rxe_mr.c | 79 ++++-- >> drivers/infiniband/sw/rxe/rxe_mw.c | 356 +++++++++++++++++++++++++ >> drivers/infiniband/sw/rxe/rxe_opcode.c | 11 +- >> drivers/infiniband/sw/rxe/rxe_opcode.h | 3 +- >> drivers/infiniband/sw/rxe/rxe_param.h | 19 +- >> drivers/infiniband/sw/rxe/rxe_pool.c | 45 ++-- >> drivers/infiniband/sw/rxe/rxe_pool.h | 8 +- >> drivers/infiniband/sw/rxe/rxe_req.c | 102 ++++--- >> drivers/infiniband/sw/rxe/rxe_resp.c | 110 +++++--- >> drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- >> drivers/infiniband/sw/rxe/rxe_verbs.h | 38 ++- >> include/uapi/rdma/rdma_user_rxe.h | 34 ++- >> 16 files changed, 691 insertions(+), 151 deletions(-) >> create mode 100644 drivers/infiniband/sw/rxe/rxe_mw.c >> -- >> 2.27.0 >>
On Tue, May 25, 2021 at 12:04 AM Pearson, Robert B <rpearsonhpe@gmail.com> wrote: > > On 5/23/2021 10:14 PM, Zhu Yanjun wrote: > > On Sat, May 22, 2021 at 4:19 AM Bob Pearson <rpearsonhpe@gmail.com> wrote: > >> This series of patches implement memory windows for the rdma_rxe > >> driver. This is a shorter reimplementation of an earlier patch set. > >> They apply to and depend on the current for-next linux rdma tree. > >> > >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > >> --- > >> v7: > >> Fixed a duplicate INIT_RDMA_OBJ_SIZE(ib_mw, ...) in rxe_verbs.c. > > With this patch series, there are about 17 errors and 1 failure in rdma-core. > > Zhu, > > You have to sync the kernel-header file with the kernel. From the link https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/kbuild/headers_install.rst?h=v5.13-rc3 you mean "make headers_install"? In fact, after "make headers_install", these patches still cause errors and failures in rdma-core. I will delve into these errors of rdma-core. Too many errors. Zhu Yanjun > > Bob > > > " > > ---------------------------------------------------------------------- > > Ran 183 tests in 2.130s > > > > FAILED (failures=1, errors=17, skipped=124) > > " > > > > After these patches, not sure if rxe can communicate with the physical > > NICs correctly because of the > > above errors and failure. > > > > Zhu Yanjun > > > >> v6: > >> Added rxe_ prefix to subroutine names in lines that changed > >> from Zhu's review of v5. > >> v5: > >> Fixed a typo in 10th patch. > >> v4: > >> Added a 10th patch to check when MRs have bound MWs > >> and disallow dereg and invalidate operations. > >> v3: > >> cleaned up void return and lower case enums from > >> Zhu's review. > >> v2: > >> cleaned up an issue in rdma_user_rxe.h > >> cleaned up a collision in rxe_resp.c > >> > >> Bob Pearson (9): > >> RDMA/rxe: Add bind MW fields to rxe_send_wr > >> RDMA/rxe: Return errors for add index and key > >> RDMA/rxe: Enable MW object pool > >> RDMA/rxe: Add ib_alloc_mw and ib_dealloc_mw verbs > >> RDMA/rxe: Replace WR_REG_MASK by WR_LOCAL_OP_MASK > >> RDMA/rxe: Move local ops to subroutine > >> RDMA/rxe: Add support for bind MW work requests > >> RDMA/rxe: Implement invalidate MW operations > >> RDMA/rxe: Implement memory access through MWs > >> > >> drivers/infiniband/sw/rxe/Makefile | 1 + > >> drivers/infiniband/sw/rxe/rxe.c | 1 + > >> drivers/infiniband/sw/rxe/rxe_comp.c | 1 + > >> drivers/infiniband/sw/rxe/rxe_loc.h | 29 +- > >> drivers/infiniband/sw/rxe/rxe_mr.c | 79 ++++-- > >> drivers/infiniband/sw/rxe/rxe_mw.c | 356 +++++++++++++++++++++++++ > >> drivers/infiniband/sw/rxe/rxe_opcode.c | 11 +- > >> drivers/infiniband/sw/rxe/rxe_opcode.h | 3 +- > >> drivers/infiniband/sw/rxe/rxe_param.h | 19 +- > >> drivers/infiniband/sw/rxe/rxe_pool.c | 45 ++-- > >> drivers/infiniband/sw/rxe/rxe_pool.h | 8 +- > >> drivers/infiniband/sw/rxe/rxe_req.c | 102 ++++--- > >> drivers/infiniband/sw/rxe/rxe_resp.c | 110 +++++--- > >> drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- > >> drivers/infiniband/sw/rxe/rxe_verbs.h | 38 ++- > >> include/uapi/rdma/rdma_user_rxe.h | 34 ++- > >> 16 files changed, 691 insertions(+), 151 deletions(-) > >> create mode 100644 drivers/infiniband/sw/rxe/rxe_mw.c > >> -- > >> 2.27.0 > >>
Zhu, I'm not sure about the script. Starting from where you were I copied <LINUX>/include/uapi/rdma/rdma_user_rxe.h to <RDMA_CORE>/kernel-headers/rdma/rdma_user_rxe.h. After running the script you should be able to just diff these two files to make sure they are the same. If they aren't copy the header file over. After the shift to 5.13 rc1+ I re-pulled both trees and applied the kernel patches and then built everything. The python test cases look like .............sssssssss.............sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.ssssssssssssssssssssssssss....ssss.............s.....s.......ssssssssss..ss ---------------------------------------------------------------------- Ran 182 tests in 0.380s OK (skipped=124) There are a lot of skips but no errors. The skips are from features that rxe does not support. Adding the MW rdma_core patch picks up a small number of additional test cases involving memory windows. Regards, Bob -----Original Message----- From: Zhu Yanjun <zyjzyj2000@gmail.com> Sent: Monday, May 24, 2021 9:09 PM To: Pearson, Robert B <rpearsonhpe@gmail.com> Cc: Jason Gunthorpe <jgg@nvidia.com>; RDMA mailing list <linux-rdma@vger.kernel.org> Subject: Re: [PATCH for-next v7 00/10] RDMA/rxe: Implement memory windows On Tue, May 25, 2021 at 12:04 AM Pearson, Robert B <rpearsonhpe@gmail.com> wrote: > > On 5/23/2021 10:14 PM, Zhu Yanjun wrote: > > On Sat, May 22, 2021 at 4:19 AM Bob Pearson <rpearsonhpe@gmail.com> wrote: > >> This series of patches implement memory windows for the rdma_rxe > >> driver. This is a shorter reimplementation of an earlier patch set. > >> They apply to and depend on the current for-next linux rdma tree. > >> > >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > >> --- > >> v7: > >> Fixed a duplicate INIT_RDMA_OBJ_SIZE(ib_mw, ...) in rxe_verbs.c. > > With this patch series, there are about 17 errors and 1 failure in rdma-core. > > Zhu, > > You have to sync the kernel-header file with the kernel. From the link https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/kbuild/headers_install.rst?h=v5.13-rc3 you mean "make headers_install"? In fact, after "make headers_install", these patches still cause errors and failures in rdma-core. I will delve into these errors of rdma-core. Too many errors. Zhu Yanjun > > Bob > > > " > > -------------------------------------------------------------------- > > -- > > Ran 183 tests in 2.130s > > > > FAILED (failures=1, errors=17, skipped=124) " > > > > After these patches, not sure if rxe can communicate with the > > physical NICs correctly because of the above errors and failure. > > > > Zhu Yanjun > > > >> v6: > >> Added rxe_ prefix to subroutine names in lines that changed > >> from Zhu's review of v5. > >> v5: > >> Fixed a typo in 10th patch. > >> v4: > >> Added a 10th patch to check when MRs have bound MWs > >> and disallow dereg and invalidate operations. > >> v3: > >> cleaned up void return and lower case enums from > >> Zhu's review. > >> v2: > >> cleaned up an issue in rdma_user_rxe.h > >> cleaned up a collision in rxe_resp.c > >> > >> Bob Pearson (9): > >> RDMA/rxe: Add bind MW fields to rxe_send_wr > >> RDMA/rxe: Return errors for add index and key > >> RDMA/rxe: Enable MW object pool > >> RDMA/rxe: Add ib_alloc_mw and ib_dealloc_mw verbs > >> RDMA/rxe: Replace WR_REG_MASK by WR_LOCAL_OP_MASK > >> RDMA/rxe: Move local ops to subroutine > >> RDMA/rxe: Add support for bind MW work requests > >> RDMA/rxe: Implement invalidate MW operations > >> RDMA/rxe: Implement memory access through MWs > >> > >> drivers/infiniband/sw/rxe/Makefile | 1 + > >> drivers/infiniband/sw/rxe/rxe.c | 1 + > >> drivers/infiniband/sw/rxe/rxe_comp.c | 1 + > >> drivers/infiniband/sw/rxe/rxe_loc.h | 29 +- > >> drivers/infiniband/sw/rxe/rxe_mr.c | 79 ++++-- > >> drivers/infiniband/sw/rxe/rxe_mw.c | 356 +++++++++++++++++++++++++ > >> drivers/infiniband/sw/rxe/rxe_opcode.c | 11 +- > >> drivers/infiniband/sw/rxe/rxe_opcode.h | 3 +- > >> drivers/infiniband/sw/rxe/rxe_param.h | 19 +- > >> drivers/infiniband/sw/rxe/rxe_pool.c | 45 ++-- > >> drivers/infiniband/sw/rxe/rxe_pool.h | 8 +- > >> drivers/infiniband/sw/rxe/rxe_req.c | 102 ++++--- > >> drivers/infiniband/sw/rxe/rxe_resp.c | 110 +++++--- > >> drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- > >> drivers/infiniband/sw/rxe/rxe_verbs.h | 38 ++- > >> include/uapi/rdma/rdma_user_rxe.h | 34 ++- > >> 16 files changed, 691 insertions(+), 151 deletions(-) > >> create mode 100644 drivers/infiniband/sw/rxe/rxe_mw.c > >> -- > >> 2.27.0 > >>
On Tue, May 25, 2021 at 12:57 PM Pearson, Robert B <robert.pearson2@hpe.com> wrote: > > Zhu, > > I'm not sure about the script. Starting from where you were I copied <LINUX>/include/uapi/rdma/rdma_user_rxe.h to <RDMA_CORE>/kernel-headers/rdma/rdma_user_rxe.h. After running the script you should be able to just diff these two files to make sure they are the same. If they aren't copy the header file over. After the shift to 5.13 > rc1+ I re-pulled both trees and applied the kernel patches and then built everything. The python test cases look like > > .............sssssssss.............sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.ssssssssssssssssssssssssss....ssss.............s.....s.......ssssssssss..ss > ---------------------------------------------------------------------- > Ran 182 tests in 0.380s Thanks. Please submit a new patch for this problem. > > OK (skipped=124) > > There are a lot of skips but no errors. The skips are from features that rxe does not support. > > Adding the MW rdma_core patch picks up a small number of additional test cases involving memory windows. Thanks a lot. Look forward to these additional test cases involving memory windows. Zhu Yanjun > > Regards, > > Bob > > -----Original Message----- > From: Zhu Yanjun <zyjzyj2000@gmail.com> > Sent: Monday, May 24, 2021 9:09 PM > To: Pearson, Robert B <rpearsonhpe@gmail.com> > Cc: Jason Gunthorpe <jgg@nvidia.com>; RDMA mailing list <linux-rdma@vger.kernel.org> > Subject: Re: [PATCH for-next v7 00/10] RDMA/rxe: Implement memory windows > > On Tue, May 25, 2021 at 12:04 AM Pearson, Robert B <rpearsonhpe@gmail.com> wrote: > > > > On 5/23/2021 10:14 PM, Zhu Yanjun wrote: > > > On Sat, May 22, 2021 at 4:19 AM Bob Pearson <rpearsonhpe@gmail.com> wrote: > > >> This series of patches implement memory windows for the rdma_rxe > > >> driver. This is a shorter reimplementation of an earlier patch set. > > >> They apply to and depend on the current for-next linux rdma tree. > > >> > > >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > > >> --- > > >> v7: > > >> Fixed a duplicate INIT_RDMA_OBJ_SIZE(ib_mw, ...) in rxe_verbs.c. > > > With this patch series, there are about 17 errors and 1 failure in rdma-core. > > > > Zhu, > > > > You have to sync the kernel-header file with the kernel. > > From the link https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/kbuild/headers_install.rst?h=v5.13-rc3 > you mean "make headers_install"? > > In fact, after "make headers_install", these patches still cause errors and failures in rdma-core. > > I will delve into these errors of rdma-core. Too many errors. > > Zhu Yanjun > > > > > Bob > > > > > " > > > -------------------------------------------------------------------- > > > -- > > > Ran 183 tests in 2.130s > > > > > > FAILED (failures=1, errors=17, skipped=124) " > > > > > > After these patches, not sure if rxe can communicate with the > > > physical NICs correctly because of the above errors and failure. > > > > > > Zhu Yanjun > > > > > >> v6: > > >> Added rxe_ prefix to subroutine names in lines that changed > > >> from Zhu's review of v5. > > >> v5: > > >> Fixed a typo in 10th patch. > > >> v4: > > >> Added a 10th patch to check when MRs have bound MWs > > >> and disallow dereg and invalidate operations. > > >> v3: > > >> cleaned up void return and lower case enums from > > >> Zhu's review. > > >> v2: > > >> cleaned up an issue in rdma_user_rxe.h > > >> cleaned up a collision in rxe_resp.c > > >> > > >> Bob Pearson (9): > > >> RDMA/rxe: Add bind MW fields to rxe_send_wr > > >> RDMA/rxe: Return errors for add index and key > > >> RDMA/rxe: Enable MW object pool > > >> RDMA/rxe: Add ib_alloc_mw and ib_dealloc_mw verbs > > >> RDMA/rxe: Replace WR_REG_MASK by WR_LOCAL_OP_MASK > > >> RDMA/rxe: Move local ops to subroutine > > >> RDMA/rxe: Add support for bind MW work requests > > >> RDMA/rxe: Implement invalidate MW operations > > >> RDMA/rxe: Implement memory access through MWs > > >> > > >> drivers/infiniband/sw/rxe/Makefile | 1 + > > >> drivers/infiniband/sw/rxe/rxe.c | 1 + > > >> drivers/infiniband/sw/rxe/rxe_comp.c | 1 + > > >> drivers/infiniband/sw/rxe/rxe_loc.h | 29 +- > > >> drivers/infiniband/sw/rxe/rxe_mr.c | 79 ++++-- > > >> drivers/infiniband/sw/rxe/rxe_mw.c | 356 +++++++++++++++++++++++++ > > >> drivers/infiniband/sw/rxe/rxe_opcode.c | 11 +- > > >> drivers/infiniband/sw/rxe/rxe_opcode.h | 3 +- > > >> drivers/infiniband/sw/rxe/rxe_param.h | 19 +- > > >> drivers/infiniband/sw/rxe/rxe_pool.c | 45 ++-- > > >> drivers/infiniband/sw/rxe/rxe_pool.h | 8 +- > > >> drivers/infiniband/sw/rxe/rxe_req.c | 102 ++++--- > > >> drivers/infiniband/sw/rxe/rxe_resp.c | 110 +++++--- > > >> drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- > > >> drivers/infiniband/sw/rxe/rxe_verbs.h | 38 ++- > > >> include/uapi/rdma/rdma_user_rxe.h | 34 ++- > > >> 16 files changed, 691 insertions(+), 151 deletions(-) > > >> create mode 100644 drivers/infiniband/sw/rxe/rxe_mw.c > > >> -- > > >> 2.27.0 > > >>
There's nothing to change. There is no problem. Just get the headers sync'ed. If that doesn't fix your issues your tree has gotten corrupted somehow. But, I don't think that is the issue. I saw the same type of errors you reported when rdma_core is built with the old header file. That definitely will cause problems. The size of the send queue WQEs changed because new fields were added. Then user space and the kernel immediately get off from each other. Good luck, Bob -----Original Message----- From: Zhu Yanjun <zyjzyj2000@gmail.com> Sent: Tuesday, May 25, 2021 12:18 AM To: Pearson, Robert B <robert.pearson2@hpe.com> Cc: Pearson, Robert B <rpearsonhpe@gmail.com>; Jason Gunthorpe <jgg@nvidia.com>; RDMA mailing list <linux-rdma@vger.kernel.org> Subject: Re: [PATCH for-next v7 00/10] RDMA/rxe: Implement memory windows On Tue, May 25, 2021 at 12:57 PM Pearson, Robert B <robert.pearson2@hpe.com> wrote: > > Zhu, > > I'm not sure about the script. Starting from where you were I copied > <LINUX>/include/uapi/rdma/rdma_user_rxe.h to > <RDMA_CORE>/kernel-headers/rdma/rdma_user_rxe.h. After running the > script you should be able to just diff these two files to make sure > they are the same. If they aren't copy the header file over. After the > shift to 5.13 > rc1+ I re-pulled both trees and applied the kernel patches and then > rc1+ built everything. The python test cases look like > > .............sssssssss.............sssssssssssssssssssssssssssssssssss > ssssssssssssssssssssssssssssssssssss.ssssssssssssssssssssssssss....sss > s.............s.....s.......ssssssssss..ss > ---------------------------------------------------------------------- > Ran 182 tests in 0.380s Thanks. Please submit a new patch for this problem. > > OK (skipped=124) > > There are a lot of skips but no errors. The skips are from features that rxe does not support. > > Adding the MW rdma_core patch picks up a small number of additional test cases involving memory windows. Thanks a lot. Look forward to these additional test cases involving memory windows. Zhu Yanjun > > Regards, > > Bob > > -----Original Message----- > From: Zhu Yanjun <zyjzyj2000@gmail.com> > Sent: Monday, May 24, 2021 9:09 PM > To: Pearson, Robert B <rpearsonhpe@gmail.com> > Cc: Jason Gunthorpe <jgg@nvidia.com>; RDMA mailing list > <linux-rdma@vger.kernel.org> > Subject: Re: [PATCH for-next v7 00/10] RDMA/rxe: Implement memory > windows > > On Tue, May 25, 2021 at 12:04 AM Pearson, Robert B <rpearsonhpe@gmail.com> wrote: > > > > On 5/23/2021 10:14 PM, Zhu Yanjun wrote: > > > On Sat, May 22, 2021 at 4:19 AM Bob Pearson <rpearsonhpe@gmail.com> wrote: > > >> This series of patches implement memory windows for the rdma_rxe > > >> driver. This is a shorter reimplementation of an earlier patch set. > > >> They apply to and depend on the current for-next linux rdma tree. > > >> > > >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > > >> --- > > >> v7: > > >> Fixed a duplicate INIT_RDMA_OBJ_SIZE(ib_mw, ...) in rxe_verbs.c. > > > With this patch series, there are about 17 errors and 1 failure in rdma-core. > > > > Zhu, > > > > You have to sync the kernel-header file with the kernel. > > From the link > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre > e/Documentation/kbuild/headers_install.rst?h=v5.13-rc3 > you mean "make headers_install"? > > In fact, after "make headers_install", these patches still cause errors and failures in rdma-core. > > I will delve into these errors of rdma-core. Too many errors. > > Zhu Yanjun > > > > > Bob > > > > > " > > > ------------------------------------------------------------------ > > > -- > > > -- > > > Ran 183 tests in 2.130s > > > > > > FAILED (failures=1, errors=17, skipped=124) " > > > > > > After these patches, not sure if rxe can communicate with the > > > physical NICs correctly because of the above errors and failure. > > > > > > Zhu Yanjun > > > > > >> v6: > > >> Added rxe_ prefix to subroutine names in lines that changed > > >> from Zhu's review of v5. > > >> v5: > > >> Fixed a typo in 10th patch. > > >> v4: > > >> Added a 10th patch to check when MRs have bound MWs > > >> and disallow dereg and invalidate operations. > > >> v3: > > >> cleaned up void return and lower case enums from > > >> Zhu's review. > > >> v2: > > >> cleaned up an issue in rdma_user_rxe.h > > >> cleaned up a collision in rxe_resp.c > > >> > > >> Bob Pearson (9): > > >> RDMA/rxe: Add bind MW fields to rxe_send_wr > > >> RDMA/rxe: Return errors for add index and key > > >> RDMA/rxe: Enable MW object pool > > >> RDMA/rxe: Add ib_alloc_mw and ib_dealloc_mw verbs > > >> RDMA/rxe: Replace WR_REG_MASK by WR_LOCAL_OP_MASK > > >> RDMA/rxe: Move local ops to subroutine > > >> RDMA/rxe: Add support for bind MW work requests > > >> RDMA/rxe: Implement invalidate MW operations > > >> RDMA/rxe: Implement memory access through MWs > > >> > > >> drivers/infiniband/sw/rxe/Makefile | 1 + > > >> drivers/infiniband/sw/rxe/rxe.c | 1 + > > >> drivers/infiniband/sw/rxe/rxe_comp.c | 1 + > > >> drivers/infiniband/sw/rxe/rxe_loc.h | 29 +- > > >> drivers/infiniband/sw/rxe/rxe_mr.c | 79 ++++-- > > >> drivers/infiniband/sw/rxe/rxe_mw.c | 356 +++++++++++++++++++++++++ > > >> drivers/infiniband/sw/rxe/rxe_opcode.c | 11 +- > > >> drivers/infiniband/sw/rxe/rxe_opcode.h | 3 +- > > >> drivers/infiniband/sw/rxe/rxe_param.h | 19 +- > > >> drivers/infiniband/sw/rxe/rxe_pool.c | 45 ++-- > > >> drivers/infiniband/sw/rxe/rxe_pool.h | 8 +- > > >> drivers/infiniband/sw/rxe/rxe_req.c | 102 ++++--- > > >> drivers/infiniband/sw/rxe/rxe_resp.c | 110 +++++--- > > >> drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- > > >> drivers/infiniband/sw/rxe/rxe_verbs.h | 38 ++- > > >> include/uapi/rdma/rdma_user_rxe.h | 34 ++- > > >> 16 files changed, 691 insertions(+), 151 deletions(-) > > >> create mode 100644 drivers/infiniband/sw/rxe/rxe_mw.c > > >> -- > > >> 2.27.0 > > >>
On Tue, May 25, 2021 at 1:27 PM Pearson, Robert B <robert.pearson2@hpe.com> wrote: > > There's nothing to change. There is no problem. Just get the headers sync'ed. I delved into the errors. I found that the following would fix these errors in rdma-core. diff --git a/kernel-headers/rdma/rdma_user_rxe.h b/kernel-headers/rdma/rdma_user_rxe.h index 068433e2..90ea477f 100644 --- a/kernel-headers/rdma/rdma_user_rxe.h +++ b/kernel-headers/rdma/rdma_user_rxe.h @@ -99,7 +99,17 @@ struct rxe_send_wr { __u32 remote_qkey; __u16 pkey_index; } ud; + struct { + __aligned_u64 addr; + __aligned_u64 length; + __u32 mr_lkey; + __u32 mw_rkey; + __u32 rkey; + __u32 access; + __u32 flags; + } mw; /* reg is only used by the kernel and is not part of the uapi */ +#ifdef __KERNEL__ struct { union { struct ib_mr *mr; @@ -108,6 +118,7 @@ struct rxe_send_wr { __u32 key; __u32 access; } reg; +#endif } wr; }; Zhu Yanjun > If that doesn't fix your issues your tree has gotten corrupted somehow. But, I don't think that is the issue. I saw the same type of errors you reported when rdma_core is built with the old header file. That definitely will cause problems. The size of the send queue WQEs changed because new fields were added. Then user space and the kernel immediately get off from each other. > > Good luck, > > Bob > > -----Original Message----- > From: Zhu Yanjun <zyjzyj2000@gmail.com> > Sent: Tuesday, May 25, 2021 12:18 AM > To: Pearson, Robert B <robert.pearson2@hpe.com> > Cc: Pearson, Robert B <rpearsonhpe@gmail.com>; Jason Gunthorpe <jgg@nvidia.com>; RDMA mailing list <linux-rdma@vger.kernel.org> > Subject: Re: [PATCH for-next v7 00/10] RDMA/rxe: Implement memory windows > > On Tue, May 25, 2021 at 12:57 PM Pearson, Robert B <robert.pearson2@hpe.com> wrote: > > > > Zhu, > > > > I'm not sure about the script. Starting from where you were I copied > > <LINUX>/include/uapi/rdma/rdma_user_rxe.h to > > <RDMA_CORE>/kernel-headers/rdma/rdma_user_rxe.h. After running the > > script you should be able to just diff these two files to make sure > > they are the same. If they aren't copy the header file over. After the > > shift to 5.13 > > rc1+ I re-pulled both trees and applied the kernel patches and then > > rc1+ built everything. The python test cases look like > > > > .............sssssssss.............sssssssssssssssssssssssssssssssssss > > ssssssssssssssssssssssssssssssssssss.ssssssssssssssssssssssssss....sss > > s.............s.....s.......ssssssssss..ss > > ---------------------------------------------------------------------- > > Ran 182 tests in 0.380s > > Thanks. Please submit a new patch for this problem. > > > > > OK (skipped=124) > > > > There are a lot of skips but no errors. The skips are from features that rxe does not support. > > > > Adding the MW rdma_core patch picks up a small number of additional test cases involving memory windows. > > Thanks a lot. Look forward to these additional test cases involving memory windows. > > Zhu Yanjun > > > > > Regards, > > > > Bob > > > > -----Original Message----- > > From: Zhu Yanjun <zyjzyj2000@gmail.com> > > Sent: Monday, May 24, 2021 9:09 PM > > To: Pearson, Robert B <rpearsonhpe@gmail.com> > > Cc: Jason Gunthorpe <jgg@nvidia.com>; RDMA mailing list > > <linux-rdma@vger.kernel.org> > > Subject: Re: [PATCH for-next v7 00/10] RDMA/rxe: Implement memory > > windows > > > > On Tue, May 25, 2021 at 12:04 AM Pearson, Robert B <rpearsonhpe@gmail.com> wrote: > > > > > > On 5/23/2021 10:14 PM, Zhu Yanjun wrote: > > > > On Sat, May 22, 2021 at 4:19 AM Bob Pearson <rpearsonhpe@gmail.com> wrote: > > > >> This series of patches implement memory windows for the rdma_rxe > > > >> driver. This is a shorter reimplementation of an earlier patch set. > > > >> They apply to and depend on the current for-next linux rdma tree. > > > >> > > > >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > > > >> --- > > > >> v7: > > > >> Fixed a duplicate INIT_RDMA_OBJ_SIZE(ib_mw, ...) in rxe_verbs.c. > > > > With this patch series, there are about 17 errors and 1 failure in rdma-core. > > > > > > Zhu, > > > > > > You have to sync the kernel-header file with the kernel. > > > > From the link > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre > > e/Documentation/kbuild/headers_install.rst?h=v5.13-rc3 > > you mean "make headers_install"? > > > > In fact, after "make headers_install", these patches still cause errors and failures in rdma-core. > > > > I will delve into these errors of rdma-core. Too many errors. > > > > Zhu Yanjun > > > > > > > > Bob > > > > > > > " > > > > ------------------------------------------------------------------ > > > > -- > > > > -- > > > > Ran 183 tests in 2.130s > > > > > > > > FAILED (failures=1, errors=17, skipped=124) " > > > > > > > > After these patches, not sure if rxe can communicate with the > > > > physical NICs correctly because of the above errors and failure. > > > > > > > > Zhu Yanjun > > > > > > > >> v6: > > > >> Added rxe_ prefix to subroutine names in lines that changed > > > >> from Zhu's review of v5. > > > >> v5: > > > >> Fixed a typo in 10th patch. > > > >> v4: > > > >> Added a 10th patch to check when MRs have bound MWs > > > >> and disallow dereg and invalidate operations. > > > >> v3: > > > >> cleaned up void return and lower case enums from > > > >> Zhu's review. > > > >> v2: > > > >> cleaned up an issue in rdma_user_rxe.h > > > >> cleaned up a collision in rxe_resp.c > > > >> > > > >> Bob Pearson (9): > > > >> RDMA/rxe: Add bind MW fields to rxe_send_wr > > > >> RDMA/rxe: Return errors for add index and key > > > >> RDMA/rxe: Enable MW object pool > > > >> RDMA/rxe: Add ib_alloc_mw and ib_dealloc_mw verbs > > > >> RDMA/rxe: Replace WR_REG_MASK by WR_LOCAL_OP_MASK > > > >> RDMA/rxe: Move local ops to subroutine > > > >> RDMA/rxe: Add support for bind MW work requests > > > >> RDMA/rxe: Implement invalidate MW operations > > > >> RDMA/rxe: Implement memory access through MWs > > > >> > > > >> drivers/infiniband/sw/rxe/Makefile | 1 + > > > >> drivers/infiniband/sw/rxe/rxe.c | 1 + > > > >> drivers/infiniband/sw/rxe/rxe_comp.c | 1 + > > > >> drivers/infiniband/sw/rxe/rxe_loc.h | 29 +- > > > >> drivers/infiniband/sw/rxe/rxe_mr.c | 79 ++++-- > > > >> drivers/infiniband/sw/rxe/rxe_mw.c | 356 +++++++++++++++++++++++++ > > > >> drivers/infiniband/sw/rxe/rxe_opcode.c | 11 +- > > > >> drivers/infiniband/sw/rxe/rxe_opcode.h | 3 +- > > > >> drivers/infiniband/sw/rxe/rxe_param.h | 19 +- > > > >> drivers/infiniband/sw/rxe/rxe_pool.c | 45 ++-- > > > >> drivers/infiniband/sw/rxe/rxe_pool.h | 8 +- > > > >> drivers/infiniband/sw/rxe/rxe_req.c | 102 ++++--- > > > >> drivers/infiniband/sw/rxe/rxe_resp.c | 110 +++++--- > > > >> drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- > > > >> drivers/infiniband/sw/rxe/rxe_verbs.h | 38 ++- > > > >> include/uapi/rdma/rdma_user_rxe.h | 34 ++- > > > >> 16 files changed, 691 insertions(+), 151 deletions(-) > > > >> create mode 100644 drivers/infiniband/sw/rxe/rxe_mw.c > > > >> -- > > > >> 2.27.0 > > > >>
On Tue, May 25, 2021 at 1:27 PM Pearson, Robert B <robert.pearson2@hpe.com> wrote: > > There's nothing to change. There is no problem. Just get the headers sync'ed. > If that doesn't fix your issues your tree has gotten corrupted somehow. But, I don't think that is the issue. I saw the same type of errors you reported when rdma_core is built with the old header file. That definitely will cause problems. The size of the send queue WQEs changed because new fields were added. Then user space and the kernel immediately get off from each other. > > Good luck, About rdma-core, the root cause is clear. I am fine with this patch series. Thanks, Bob. Zhu Yanjun > > Bob > > -----Original Message----- > From: Zhu Yanjun <zyjzyj2000@gmail.com> > Sent: Tuesday, May 25, 2021 12:18 AM > To: Pearson, Robert B <robert.pearson2@hpe.com> > Cc: Pearson, Robert B <rpearsonhpe@gmail.com>; Jason Gunthorpe <jgg@nvidia.com>; RDMA mailing list <linux-rdma@vger.kernel.org> > Subject: Re: [PATCH for-next v7 00/10] RDMA/rxe: Implement memory windows > > On Tue, May 25, 2021 at 12:57 PM Pearson, Robert B <robert.pearson2@hpe.com> wrote: > > > > Zhu, > > > > I'm not sure about the script. Starting from where you were I copied > > <LINUX>/include/uapi/rdma/rdma_user_rxe.h to > > <RDMA_CORE>/kernel-headers/rdma/rdma_user_rxe.h. After running the > > script you should be able to just diff these two files to make sure > > they are the same. If they aren't copy the header file over. After the > > shift to 5.13 > > rc1+ I re-pulled both trees and applied the kernel patches and then > > rc1+ built everything. The python test cases look like > > > > .............sssssssss.............sssssssssssssssssssssssssssssssssss > > ssssssssssssssssssssssssssssssssssss.ssssssssssssssssssssssssss....sss > > s.............s.....s.......ssssssssss..ss > > ---------------------------------------------------------------------- > > Ran 182 tests in 0.380s > > Thanks. Please submit a new patch for this problem. > > > > > OK (skipped=124) > > > > There are a lot of skips but no errors. The skips are from features that rxe does not support. > > > > Adding the MW rdma_core patch picks up a small number of additional test cases involving memory windows. > > Thanks a lot. Look forward to these additional test cases involving memory windows. > > Zhu Yanjun > > > > > Regards, > > > > Bob > > > > -----Original Message----- > > From: Zhu Yanjun <zyjzyj2000@gmail.com> > > Sent: Monday, May 24, 2021 9:09 PM > > To: Pearson, Robert B <rpearsonhpe@gmail.com> > > Cc: Jason Gunthorpe <jgg@nvidia.com>; RDMA mailing list > > <linux-rdma@vger.kernel.org> > > Subject: Re: [PATCH for-next v7 00/10] RDMA/rxe: Implement memory > > windows > > > > On Tue, May 25, 2021 at 12:04 AM Pearson, Robert B <rpearsonhpe@gmail.com> wrote: > > > > > > On 5/23/2021 10:14 PM, Zhu Yanjun wrote: > > > > On Sat, May 22, 2021 at 4:19 AM Bob Pearson <rpearsonhpe@gmail.com> wrote: > > > >> This series of patches implement memory windows for the rdma_rxe > > > >> driver. This is a shorter reimplementation of an earlier patch set. > > > >> They apply to and depend on the current for-next linux rdma tree. > > > >> > > > >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > > > >> --- > > > >> v7: > > > >> Fixed a duplicate INIT_RDMA_OBJ_SIZE(ib_mw, ...) in rxe_verbs.c. > > > > With this patch series, there are about 17 errors and 1 failure in rdma-core. > > > > > > Zhu, > > > > > > You have to sync the kernel-header file with the kernel. > > > > From the link > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre > > e/Documentation/kbuild/headers_install.rst?h=v5.13-rc3 > > you mean "make headers_install"? > > > > In fact, after "make headers_install", these patches still cause errors and failures in rdma-core. > > > > I will delve into these errors of rdma-core. Too many errors. > > > > Zhu Yanjun > > > > > > > > Bob > > > > > > > " > > > > ------------------------------------------------------------------ > > > > -- > > > > -- > > > > Ran 183 tests in 2.130s > > > > > > > > FAILED (failures=1, errors=17, skipped=124) " > > > > > > > > After these patches, not sure if rxe can communicate with the > > > > physical NICs correctly because of the above errors and failure. > > > > > > > > Zhu Yanjun > > > > > > > >> v6: > > > >> Added rxe_ prefix to subroutine names in lines that changed > > > >> from Zhu's review of v5. > > > >> v5: > > > >> Fixed a typo in 10th patch. > > > >> v4: > > > >> Added a 10th patch to check when MRs have bound MWs > > > >> and disallow dereg and invalidate operations. > > > >> v3: > > > >> cleaned up void return and lower case enums from > > > >> Zhu's review. > > > >> v2: > > > >> cleaned up an issue in rdma_user_rxe.h > > > >> cleaned up a collision in rxe_resp.c > > > >> > > > >> Bob Pearson (9): > > > >> RDMA/rxe: Add bind MW fields to rxe_send_wr > > > >> RDMA/rxe: Return errors for add index and key > > > >> RDMA/rxe: Enable MW object pool > > > >> RDMA/rxe: Add ib_alloc_mw and ib_dealloc_mw verbs > > > >> RDMA/rxe: Replace WR_REG_MASK by WR_LOCAL_OP_MASK > > > >> RDMA/rxe: Move local ops to subroutine > > > >> RDMA/rxe: Add support for bind MW work requests > > > >> RDMA/rxe: Implement invalidate MW operations > > > >> RDMA/rxe: Implement memory access through MWs > > > >> > > > >> drivers/infiniband/sw/rxe/Makefile | 1 + > > > >> drivers/infiniband/sw/rxe/rxe.c | 1 + > > > >> drivers/infiniband/sw/rxe/rxe_comp.c | 1 + > > > >> drivers/infiniband/sw/rxe/rxe_loc.h | 29 +- > > > >> drivers/infiniband/sw/rxe/rxe_mr.c | 79 ++++-- > > > >> drivers/infiniband/sw/rxe/rxe_mw.c | 356 +++++++++++++++++++++++++ > > > >> drivers/infiniband/sw/rxe/rxe_opcode.c | 11 +- > > > >> drivers/infiniband/sw/rxe/rxe_opcode.h | 3 +- > > > >> drivers/infiniband/sw/rxe/rxe_param.h | 19 +- > > > >> drivers/infiniband/sw/rxe/rxe_pool.c | 45 ++-- > > > >> drivers/infiniband/sw/rxe/rxe_pool.h | 8 +- > > > >> drivers/infiniband/sw/rxe/rxe_req.c | 102 ++++--- > > > >> drivers/infiniband/sw/rxe/rxe_resp.c | 110 +++++--- > > > >> drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- > > > >> drivers/infiniband/sw/rxe/rxe_verbs.h | 38 ++- > > > >> include/uapi/rdma/rdma_user_rxe.h | 34 ++- > > > >> 16 files changed, 691 insertions(+), 151 deletions(-) > > > >> create mode 100644 drivers/infiniband/sw/rxe/rxe_mw.c > > > >> -- > > > >> 2.27.0 > > > >>
On further reflection I realize I did not understand correctly the user/kernel API issue correctly. I was assuming that the user application should continue to run but that we could require re-compiling rdma-core. If we require that old rdma-core binaries run on newer kernels then the 40 bytes is an issue. I always recompiled rdma-core and didn't test running with old binaries. Fortunately there is an easy fix. The flags field in the earlier rxe mw version had one bit in it but the new version dropped that and I never went back and removed the field. Dropping the flags field doesn't break anything but lets the mw struct fit in the wr union without extending it. I will fix, retest and resubmit. Bob -----Original Message----- From: Zhu Yanjun <zyjzyj2000@gmail.com> Sent: Tuesday, May 25, 2021 10:00 AM To: Pearson, Robert B <robert.pearson2@hpe.com> Cc: Pearson, Robert B <rpearsonhpe@gmail.com>; Jason Gunthorpe <jgg@nvidia.com>; RDMA mailing list <linux-rdma@vger.kernel.org> Subject: Re: [PATCH for-next v7 00/10] RDMA/rxe: Implement memory windows On Tue, May 25, 2021 at 1:27 PM Pearson, Robert B <robert.pearson2@hpe.com> wrote: > > There's nothing to change. There is no problem. Just get the headers sync'ed. > If that doesn't fix your issues your tree has gotten corrupted somehow. But, I don't think that is the issue. I saw the same type of errors you reported when rdma_core is built with the old header file. That definitely will cause problems. The size of the send queue WQEs changed because new fields were added. Then user space and the kernel immediately get off from each other. > > Good luck, About rdma-core, the root cause is clear. I am fine with this patch series. Thanks, Bob. Zhu Yanjun > > Bob > > -----Original Message----- > From: Zhu Yanjun <zyjzyj2000@gmail.com> > Sent: Tuesday, May 25, 2021 12:18 AM > To: Pearson, Robert B <robert.pearson2@hpe.com> > Cc: Pearson, Robert B <rpearsonhpe@gmail.com>; Jason Gunthorpe > <jgg@nvidia.com>; RDMA mailing list <linux-rdma@vger.kernel.org> > Subject: Re: [PATCH for-next v7 00/10] RDMA/rxe: Implement memory > windows > > On Tue, May 25, 2021 at 12:57 PM Pearson, Robert B <robert.pearson2@hpe.com> wrote: > > > > Zhu, > > > > I'm not sure about the script. Starting from where you were I copied > > <LINUX>/include/uapi/rdma/rdma_user_rxe.h to > > <RDMA_CORE>/kernel-headers/rdma/rdma_user_rxe.h. After running the > > script you should be able to just diff these two files to make sure > > they are the same. If they aren't copy the header file over. After > > the shift to 5.13 > > rc1+ I re-pulled both trees and applied the kernel patches and then > > rc1+ built everything. The python test cases look like > > > > .............sssssssss.............sssssssssssssssssssssssssssssssss > > ss > > ssssssssssssssssssssssssssssssssssss.ssssssssssssssssssssssssss....s > > ss s.............s.....s.......ssssssssss..ss > > -------------------------------------------------------------------- > > -- > > Ran 182 tests in 0.380s > > Thanks. Please submit a new patch for this problem. > > > > > OK (skipped=124) > > > > There are a lot of skips but no errors. The skips are from features that rxe does not support. > > > > Adding the MW rdma_core patch picks up a small number of additional test cases involving memory windows. > > Thanks a lot. Look forward to these additional test cases involving memory windows. > > Zhu Yanjun > > > > > Regards, > > > > Bob > > > > -----Original Message----- > > From: Zhu Yanjun <zyjzyj2000@gmail.com> > > Sent: Monday, May 24, 2021 9:09 PM > > To: Pearson, Robert B <rpearsonhpe@gmail.com> > > Cc: Jason Gunthorpe <jgg@nvidia.com>; RDMA mailing list > > <linux-rdma@vger.kernel.org> > > Subject: Re: [PATCH for-next v7 00/10] RDMA/rxe: Implement memory > > windows > > > > On Tue, May 25, 2021 at 12:04 AM Pearson, Robert B <rpearsonhpe@gmail.com> wrote: > > > > > > On 5/23/2021 10:14 PM, Zhu Yanjun wrote: > > > > On Sat, May 22, 2021 at 4:19 AM Bob Pearson <rpearsonhpe@gmail.com> wrote: > > > >> This series of patches implement memory windows for the > > > >> rdma_rxe driver. This is a shorter reimplementation of an earlier patch set. > > > >> They apply to and depend on the current for-next linux rdma tree. > > > >> > > > >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > > > >> --- > > > >> v7: > > > >> Fixed a duplicate INIT_RDMA_OBJ_SIZE(ib_mw, ...) in rxe_verbs.c. > > > > With this patch series, there are about 17 errors and 1 failure in rdma-core. > > > > > > Zhu, > > > > > > You have to sync the kernel-header file with the kernel. > > > > From the link > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/t > > re > > e/Documentation/kbuild/headers_install.rst?h=v5.13-rc3 > > you mean "make headers_install"? > > > > In fact, after "make headers_install", these patches still cause errors and failures in rdma-core. > > > > I will delve into these errors of rdma-core. Too many errors. > > > > Zhu Yanjun > > > > > > > > Bob > > > > > > > " > > > > ---------------------------------------------------------------- > > > > -- > > > > -- > > > > -- > > > > Ran 183 tests in 2.130s > > > > > > > > FAILED (failures=1, errors=17, skipped=124) " > > > > > > > > After these patches, not sure if rxe can communicate with the > > > > physical NICs correctly because of the above errors and failure. > > > > > > > > Zhu Yanjun > > > > > > > >> v6: > > > >> Added rxe_ prefix to subroutine names in lines that changed > > > >> from Zhu's review of v5. > > > >> v5: > > > >> Fixed a typo in 10th patch. > > > >> v4: > > > >> Added a 10th patch to check when MRs have bound MWs > > > >> and disallow dereg and invalidate operations. > > > >> v3: > > > >> cleaned up void return and lower case enums from > > > >> Zhu's review. > > > >> v2: > > > >> cleaned up an issue in rdma_user_rxe.h > > > >> cleaned up a collision in rxe_resp.c > > > >> > > > >> Bob Pearson (9): > > > >> RDMA/rxe: Add bind MW fields to rxe_send_wr > > > >> RDMA/rxe: Return errors for add index and key > > > >> RDMA/rxe: Enable MW object pool > > > >> RDMA/rxe: Add ib_alloc_mw and ib_dealloc_mw verbs > > > >> RDMA/rxe: Replace WR_REG_MASK by WR_LOCAL_OP_MASK > > > >> RDMA/rxe: Move local ops to subroutine > > > >> RDMA/rxe: Add support for bind MW work requests > > > >> RDMA/rxe: Implement invalidate MW operations > > > >> RDMA/rxe: Implement memory access through MWs > > > >> > > > >> drivers/infiniband/sw/rxe/Makefile | 1 + > > > >> drivers/infiniband/sw/rxe/rxe.c | 1 + > > > >> drivers/infiniband/sw/rxe/rxe_comp.c | 1 + > > > >> drivers/infiniband/sw/rxe/rxe_loc.h | 29 +- > > > >> drivers/infiniband/sw/rxe/rxe_mr.c | 79 ++++-- > > > >> drivers/infiniband/sw/rxe/rxe_mw.c | 356 +++++++++++++++++++++++++ > > > >> drivers/infiniband/sw/rxe/rxe_opcode.c | 11 +- > > > >> drivers/infiniband/sw/rxe/rxe_opcode.h | 3 +- > > > >> drivers/infiniband/sw/rxe/rxe_param.h | 19 +- > > > >> drivers/infiniband/sw/rxe/rxe_pool.c | 45 ++-- > > > >> drivers/infiniband/sw/rxe/rxe_pool.h | 8 +- > > > >> drivers/infiniband/sw/rxe/rxe_req.c | 102 ++++--- > > > >> drivers/infiniband/sw/rxe/rxe_resp.c | 110 +++++--- > > > >> drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- > > > >> drivers/infiniband/sw/rxe/rxe_verbs.h | 38 ++- > > > >> include/uapi/rdma/rdma_user_rxe.h | 34 ++- > > > >> 16 files changed, 691 insertions(+), 151 deletions(-) > > > >> create mode 100644 drivers/infiniband/sw/rxe/rxe_mw.c > > > >> -- > > > >> 2.27.0 > > > >>
On 5/25/2021 10:23 AM, Pearson, Robert B wrote: > On further reflection I realize I did not understand correctly the user/kernel API issue correctly. I was assuming that the user application should continue to run but that we could require re-compiling rdma-core. If we require that old rdma-core binaries run on newer kernels then the 40 bytes is an issue. I always recompiled rdma-core and didn't test running with old binaries. Fortunately there is an easy fix. The flags field in the earlier rxe mw version had one bit in it but the new version dropped that and I never went back and removed the field. Dropping the flags field doesn't break anything but lets the mw struct fit in the wr union without extending it. > > I will fix, retest and resubmit. > > Bob > > -----Original Message----- > From: Zhu Yanjun <zyjzyj2000@gmail.com> > Sent: Tuesday, May 25, 2021 10:00 AM > To: Pearson, Robert B <robert.pearson2@hpe.com> > Cc: Pearson, Robert B <rpearsonhpe@gmail.com>; Jason Gunthorpe <jgg@nvidia.com>; RDMA mailing list <linux-rdma@vger.kernel.org> > Subject: Re: [PATCH for-next v7 00/10] RDMA/rxe: Implement memory windows > > On Tue, May 25, 2021 at 1:27 PM Pearson, Robert B <robert.pearson2@hpe.com> wrote: >> There's nothing to change. There is no problem. Just get the headers sync'ed. >> If that doesn't fix your issues your tree has gotten corrupted somehow. But, I don't think that is the issue. I saw the same type of errors you reported when rdma_core is built with the old header file. That definitely will cause problems. The size of the send queue WQEs changed because new fields were added. Then user space and the kernel immediately get off from each other. >> >> Good luck, > About rdma-core, the root cause is clear. I am fine with this patch series. > Thanks, Bob. > > Zhu Yanjun > Well. Interesting. Having pulled latest rdma-core again and fixed the wr.mw size issue I now see a bunch of CQ and QP errors which have nothing to do with the memory windows patches. It looks more like a memory ordering problem around the queues. Is this possibly related to the recent relaxed ordering changes?? The one py test failure I have chased down is in the resize cq test. The first time it runs after building a new module I can print out the new cqe and the current queue count and see the expected 1 which is less than 6 but the code takes the wrong branch and does not report an error. Rerunning the test I get the expected behavior and the test passes. This will take a bit of effort. Bob
On Tue, May 25, 2021 at 01:09:01PM -0500, Pearson, Robert B wrote: > > On 5/25/2021 10:23 AM, Pearson, Robert B wrote: > > On further reflection I realize I did not understand correctly the user/kernel API issue correctly. I was assuming that the user application should continue to run but that we could require re-compiling rdma-core. If we require that old rdma-core binaries run on newer kernels then the 40 bytes is an issue. I always recompiled rdma-core and didn't test running with old binaries. Fortunately there is an easy fix. The flags field in the earlier rxe mw version had one bit in it but the new version dropped that and I never went back and removed the field. Dropping the flags field doesn't break anything but lets the mw struct fit in the wr union without extending it. > > > > I will fix, retest and resubmit. > > > > Bob > > > > From: Zhu Yanjun <zyjzyj2000@gmail.com> > > Sent: Tuesday, May 25, 2021 10:00 AM > > To: Pearson, Robert B <robert.pearson2@hpe.com> > > Cc: Pearson, Robert B <rpearsonhpe@gmail.com>; Jason Gunthorpe <jgg@nvidia.com>; RDMA mailing list <linux-rdma@vger.kernel.org> > > Subject: Re: [PATCH for-next v7 00/10] RDMA/rxe: Implement memory windows > > > > On Tue, May 25, 2021 at 1:27 PM Pearson, Robert B <robert.pearson2@hpe.com> wrote: > > > There's nothing to change. There is no problem. Just get the headers sync'ed. > > > If that doesn't fix your issues your tree has gotten corrupted somehow. But, I don't think that is the issue. I saw the same type of errors you reported when rdma_core is built with the old header file. That definitely will cause problems. The size of the send queue WQEs changed because new fields were added. Then user space and the kernel immediately get off from each other. > > > > > > Good luck, > > About rdma-core, the root cause is clear. I am fine with this patch series. > > Thanks, Bob. > > > > Zhu Yanjun > > > Well. Interesting. Having pulled latest rdma-core again and fixed the wr.mw > size issue I now see a bunch of CQ and QP errors which have nothing to do > with the memory windows patches. It looks more like a memory ordering > problem around the queues. Is this possibly related to the recent relaxed > ordering changes?? They haven't been merged and wouldn't effect a SW driver like rxe > The one py test failure I have chased down is in the resize cq > test. The first time it runs after building a new module I can print > out the new cqe and the current queue count and see the expected 1 > which is less than 6 but the code takes the wrong branch and does > not report an error. Rerunning the test I get the expected behavior > and the test passes. This will take a bit of effort. Bisect the kernel? Jason
This series of patches implement memory windows for the rdma_rxe driver. This is a shorter reimplementation of an earlier patch set. They apply to and depend on the current for-next linux rdma tree. Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> --- v7: Fixed a duplicate INIT_RDMA_OBJ_SIZE(ib_mw, ...) in rxe_verbs.c. v6: Added rxe_ prefix to subroutine names in lines that changed from Zhu's review of v5. v5: Fixed a typo in 10th patch. v4: Added a 10th patch to check when MRs have bound MWs and disallow dereg and invalidate operations. v3: cleaned up void return and lower case enums from Zhu's review. v2: cleaned up an issue in rdma_user_rxe.h cleaned up a collision in rxe_resp.c Bob Pearson (9): RDMA/rxe: Add bind MW fields to rxe_send_wr RDMA/rxe: Return errors for add index and key RDMA/rxe: Enable MW object pool RDMA/rxe: Add ib_alloc_mw and ib_dealloc_mw verbs RDMA/rxe: Replace WR_REG_MASK by WR_LOCAL_OP_MASK RDMA/rxe: Move local ops to subroutine RDMA/rxe: Add support for bind MW work requests RDMA/rxe: Implement invalidate MW operations RDMA/rxe: Implement memory access through MWs drivers/infiniband/sw/rxe/Makefile | 1 + drivers/infiniband/sw/rxe/rxe.c | 1 + drivers/infiniband/sw/rxe/rxe_comp.c | 1 + drivers/infiniband/sw/rxe/rxe_loc.h | 29 +- drivers/infiniband/sw/rxe/rxe_mr.c | 79 ++++-- drivers/infiniband/sw/rxe/rxe_mw.c | 356 +++++++++++++++++++++++++ drivers/infiniband/sw/rxe/rxe_opcode.c | 11 +- drivers/infiniband/sw/rxe/rxe_opcode.h | 3 +- drivers/infiniband/sw/rxe/rxe_param.h | 19 +- drivers/infiniband/sw/rxe/rxe_pool.c | 45 ++-- drivers/infiniband/sw/rxe/rxe_pool.h | 8 +- drivers/infiniband/sw/rxe/rxe_req.c | 102 ++++--- drivers/infiniband/sw/rxe/rxe_resp.c | 110 +++++--- drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- drivers/infiniband/sw/rxe/rxe_verbs.h | 38 ++- include/uapi/rdma/rdma_user_rxe.h | 34 ++- 16 files changed, 691 insertions(+), 151 deletions(-) create mode 100644 drivers/infiniband/sw/rxe/rxe_mw.c