mbox series

[net,0/2] net/smc: fix kernel panic caused by race of smc_sock

Message ID 20211228090325.27263-1-dust.li@linux.alibaba.com (mailing list archive)
Headers show
Series net/smc: fix kernel panic caused by race of smc_sock | expand

Message

Dust Li Dec. 28, 2021, 9:03 a.m. UTC
This patchset fixes the race between smc_release triggered by
close(2) and cdc_handle triggered by underlaying RDMA device.

The race is caused because the smc_connection may been released
before the pending tx CDC messages got its CQEs. In order to fix
this, I add a counter to track how many pending WRs we have posted
through the smc_connection, and only release the smc_connection
after there is no pending WRs on the connection.

The first patch prevents posting WR on a QP that is not in RTS
state. This patch is needed because if we post WR on a QP that
is not in RTS state, ib_post_send() may success but no CQE will
return, and that will confuse the counter tracking the pending
WRs.

The second patch add a counter to track how many WRs were posted
through the smc_connection, and don't reset the QP on link destroying
to prevent leak of the counter.

Dust Li (2):
  net/smc: don't send CDC/LLC message if link not ready
  net/smc: fix kernel panic caused by race of smc_sock

 net/smc/smc.h      |  5 +++++
 net/smc/smc_cdc.c  | 52 +++++++++++++++++++++-------------------------
 net/smc/smc_cdc.h  |  2 +-
 net/smc/smc_core.c | 27 ++++++++++++++++++------
 net/smc/smc_core.h |  6 ++++++
 net/smc/smc_ib.c   |  4 ++--
 net/smc/smc_ib.h   |  1 +
 net/smc/smc_llc.c  |  2 +-
 net/smc/smc_wr.c   | 45 +++++----------------------------------
 net/smc/smc_wr.h   |  5 ++---
 10 files changed, 68 insertions(+), 81 deletions(-)

Comments

patchwork-bot+netdevbpf@kernel.org Dec. 28, 2021, 12:50 p.m. UTC | #1
Hello:

This series was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Tue, 28 Dec 2021 17:03:23 +0800 you wrote:
> This patchset fixes the race between smc_release triggered by
> close(2) and cdc_handle triggered by underlaying RDMA device.
> 
> The race is caused because the smc_connection may been released
> before the pending tx CDC messages got its CQEs. In order to fix
> this, I add a counter to track how many pending WRs we have posted
> through the smc_connection, and only release the smc_connection
> after there is no pending WRs on the connection.
> 
> [...]

Here is the summary with links:
  - [net,1/2] net/smc: don't send CDC/LLC message if link not ready
    https://git.kernel.org/netdev/net/c/90cee52f2e78
  - [net,2/2] net/smc: fix kernel panic caused by race of smc_sock
    https://git.kernel.org/netdev/net/c/349d43127dac

You are awesome, thank you!