Message ID | 20230304174533.11296-1-rpearsonhpe@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | RDMA/rxe: Correct qp reference counting | expand |
On Sat, Mar 04, 2023 at 11:45:26AM -0600, Bob Pearson wrote: > This patch series corrects qp reference counting issues > related to deferred execution of tasklets. These issues were > discovered in attempting to resolve soft lockups of the rxe > driver observed by Daisuke Matsuda in a version of the driver > using work queues where the workqueue implementation was based > on the current tasklet based driver. An attempt to find the > root cause of those lockups lead to an error in the tasklet > implementation that has been present since the driver went > upstream. This patch series corrects that error. > > With this patch series applied the rxe driver is more stable and > has run the test cases reported by Matsuda for over 24 hours without > errors. > > The series also corrects some errors in qp reference counting > related to qp cleanup. > > This series depends on the RDMA/rxe: Add error logging to rxe" > series as a prerequisite. > > Link: https://lore.kernel.org/linux-rdma/TYCPR01MB845522FD536170D75068DD41E5099@TYCPR01MB8455.jpnprd01.prod.outlook.com/ > Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> > > v3: > Fixed an error in patch 4/8 "RDMA/rxe: Cleanup error state handling in > rxe_comp.c". Didn't set wqe.status to IB_WC_WR_FLUSH_ERR when > flushing send queue. This broke blktests which calls modify qp to > set qp to IB_QPS_ERR and waits for the flushed cqe's. > > v2: > This version of this series split off the changes to rxe debug code > which have been submitted as "RDMA/rxe: Add error logging to rxe". > One unrelated patch was dropped and other patches earlier included > in a series to convert from tasklets to workqueues were moved into > this series because they are relevant both for the tasklet version > and the workqueue version of the driver. > > Bob Pearson (8): > RDMA/rxe: Convert tasklet args to queue pairs > RDMA/rxe: Cleanup reset state handling in rxe_resp.c > RDMA/rxe: Cleanup error state handling in rxe_comp.c > RDMA/rxe: Remove qp reference counting in tasks > RDMA/rxe: Remove __rxe_do_task() > RDMA/rxe: Make tasks schedule each other > RDMA/rxe: Rewrite rxe_task.c Applied to for-next > RDMA/rxe: Warn if refcnt zero in rxe_put This one I dropped Thanks, Jason
This patch series corrects qp reference counting issues related to deferred execution of tasklets. These issues were discovered in attempting to resolve soft lockups of the rxe driver observed by Daisuke Matsuda in a version of the driver using work queues where the workqueue implementation was based on the current tasklet based driver. An attempt to find the root cause of those lockups lead to an error in the tasklet implementation that has been present since the driver went upstream. This patch series corrects that error. With this patch series applied the rxe driver is more stable and has run the test cases reported by Matsuda for over 24 hours without errors. The series also corrects some errors in qp reference counting related to qp cleanup. This series depends on the RDMA/rxe: Add error logging to rxe" series as a prerequisite. Link: https://lore.kernel.org/linux-rdma/TYCPR01MB845522FD536170D75068DD41E5099@TYCPR01MB8455.jpnprd01.prod.outlook.com/ Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> v3: Fixed an error in patch 4/8 "RDMA/rxe: Cleanup error state handling in rxe_comp.c". Didn't set wqe.status to IB_WC_WR_FLUSH_ERR when flushing send queue. This broke blktests which calls modify qp to set qp to IB_QPS_ERR and waits for the flushed cqe's. v2: This version of this series split off the changes to rxe debug code which have been submitted as "RDMA/rxe: Add error logging to rxe". One unrelated patch was dropped and other patches earlier included in a series to convert from tasklets to workqueues were moved into this series because they are relevant both for the tasklet version and the workqueue version of the driver. Bob Pearson (8): RDMA/rxe: Convert tasklet args to queue pairs RDMA/rxe: Warn if refcnt zero in rxe_put RDMA/rxe: Cleanup reset state handling in rxe_resp.c RDMA/rxe: Cleanup error state handling in rxe_comp.c RDMA/rxe: Remove qp reference counting in tasks RDMA/rxe: Remove __rxe_do_task() RDMA/rxe: Make tasks schedule each other RDMA/rxe: Rewrite rxe_task.c drivers/infiniband/sw/rxe/rxe.h | 1 - drivers/infiniband/sw/rxe/rxe_comp.c | 71 +++++-- drivers/infiniband/sw/rxe/rxe_cq.c | 1 + drivers/infiniband/sw/rxe/rxe_loc.h | 6 +- drivers/infiniband/sw/rxe/rxe_pool.c | 2 + drivers/infiniband/sw/rxe/rxe_qp.c | 56 ++---- drivers/infiniband/sw/rxe/rxe_req.c | 12 +- drivers/infiniband/sw/rxe/rxe_resp.c | 114 ++++++------ drivers/infiniband/sw/rxe/rxe_task.c | 268 +++++++++++++++++++++------ drivers/infiniband/sw/rxe/rxe_task.h | 23 ++- 10 files changed, 353 insertions(+), 201 deletions(-) base-commit: c48438c1e307a86a2c20c4c42120e580de4d8dbc