mbox series

[for-next,0/6] RDMA: Use refcount_t for reference counting

Message ID 1620958299-4869-1-git-send-email-liweihang@huawei.com (mailing list archive)
Headers show
Series RDMA: Use refcount_t for reference counting | expand

Message

Weihang Li May 14, 2021, 2:11 a.m. UTC
There are some refcnt in type of atomic_t in RDMA subsystem, almost all of
them is wrote before the refcount_t is acheived in kernel. refcount_t is
better than integer for reference counting, it will WARN on
overflow/underflow and avoid use-after-free risks.

Weihang Li (6):
  RDMA/core: Use refcount_t instead of atomic_t for reference counting
  RDMA/hns: Use refcount_t instead of atomic_t for reference counting
  RDMA/hns: Use refcount_t APIs for HEM
  RDMA/cxgb4: Use refcount_t instead of atomic_t for reference counting
  RDMA/i40iw: Use refcount_t instead of atomic_t for reference counting
  RDMA/ipoib: Use refcount_t instead of atomic_t for reference counting

 drivers/infiniband/core/iwcm.c              |  9 +++--
 drivers/infiniband/core/iwcm.h              |  2 +-
 drivers/infiniband/core/iwpm_util.c         | 12 ++++---
 drivers/infiniband/core/iwpm_util.h         |  2 +-
 drivers/infiniband/core/mad_priv.h          |  2 +-
 drivers/infiniband/core/multicast.c         | 30 ++++++++--------
 drivers/infiniband/core/uverbs.h            |  2 +-
 drivers/infiniband/core/uverbs_main.c       | 12 +++----
 drivers/infiniband/hw/cxgb4/cq.c            |  6 ++--
 drivers/infiniband/hw/cxgb4/ev.c            |  8 ++---
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h      |  2 +-
 drivers/infiniband/hw/hns/hns_roce_cq.c     |  8 ++---
 drivers/infiniband/hw/hns/hns_roce_device.h |  6 ++--
 drivers/infiniband/hw/hns/hns_roce_hem.c    | 33 +++++++++---------
 drivers/infiniband/hw/hns/hns_roce_hem.h    |  4 +--
 drivers/infiniband/hw/hns/hns_roce_qp.c     | 12 +++----
 drivers/infiniband/hw/hns/hns_roce_srq.c    |  8 ++---
 drivers/infiniband/hw/i40iw/i40iw.h         |  2 +-
 drivers/infiniband/hw/i40iw/i40iw_cm.c      | 54 ++++++++++++++---------------
 drivers/infiniband/hw/i40iw/i40iw_cm.h      |  4 +--
 drivers/infiniband/hw/i40iw/i40iw_main.c    |  2 +-
 drivers/infiniband/hw/i40iw/i40iw_puda.h    |  2 +-
 drivers/infiniband/hw/i40iw/i40iw_utils.c   | 10 +++---
 drivers/infiniband/ulp/ipoib/ipoib.h        |  4 +--
 drivers/infiniband/ulp/ipoib/ipoib_main.c   |  8 ++---
 25 files changed, 123 insertions(+), 121 deletions(-)

Comments

Leon Romanovsky May 16, 2021, 10:18 a.m. UTC | #1
On Fri, May 14, 2021 at 10:11:33AM +0800, Weihang Li wrote:
> There are some refcnt in type of atomic_t in RDMA subsystem, almost all of
> them is wrote before the refcount_t is acheived in kernel. refcount_t is
> better than integer for reference counting, it will WARN on
> overflow/underflow and avoid use-after-free risks.
> 
> Weihang Li (6):
>   RDMA/core: Use refcount_t instead of atomic_t for reference counting
>   RDMA/hns: Use refcount_t instead of atomic_t for reference counting
>   RDMA/hns: Use refcount_t APIs for HEM
>   RDMA/cxgb4: Use refcount_t instead of atomic_t for reference counting
>   RDMA/i40iw: Use refcount_t instead of atomic_t for reference counting
>   RDMA/ipoib: Use refcount_t instead of atomic_t for reference counting

This series causes to the following splat when restarting mlx4 or mlx5 drivers.

[  227.529930] ------------[ cut here ]------------
[  227.530966] refcount_t: addition on 0; use-after-free.
[  227.531890] WARNING: CPU: 1 PID: 579 at lib/refcount.c:25 refcount_warn_saturate+0xb4/0x130
[  227.533427] Modules linked in: bonding nf_tables ipip tunnel4 geneve ip6_gre ip6_tunnel tunnel6 ip_gre gre mlx5_ib mlx5_core ptp pps_core rdma_ucm ib_uverbs ib_ipoib ib_umad openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core overlay fuse [last unloaded: ib_uverbs]
[  227.539872] CPU: 1 PID: 579 Comm: kworker/u20:7 Not tainted 5.13.0-rc1_2021_05_14_02_56_31_ #1
[  227.541458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[  227.543424] Workqueue: ipoib_wq ipoib_mcast_join_task [ib_ipoib]
[  227.544497] RIP: 0010:refcount_warn_saturate+0xb4/0x130
[  227.545515] Code: 5b c3 0f b6 1d 69 19 40 01 80 fb 01 0f 87 60 fd 52 00 83 e3 01 75 91 48 c7 c7 88 9a 57 82 c6 05 4d 19 40 01 01 e8 35 50 4e 00 <0f> 0b 5b c3 0f b6 1d 3f 19 40 01 80 fb 01 0f 87 70 fd 52 00 83 e3
[  227.548723] RSP: 0018:ffff888133d23cb0 EFLAGS: 00010082
[  227.549739] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8882f58a78b8
[  227.551077] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff8882f58a78b0
[  227.552306] RBP: ffff888133d23d30 R08: ffffffff83b5fd40 R09: 0000000000000001
[  227.553589] R10: 0000000000000001 R11: 746e756f63666572 R12: ffff88813e00c700
[  227.554885] R13: ffff8881182a3720 R14: ffff88810d06f938 R15: ffff8881182a3600
[  227.556104] FS:  0000000000000000(0000) GS:ffff8882f5880000(0000) knlGS:0000000000000000
[  227.557594] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  227.558719] CR2: 000055fb4a2ee768 CR3: 000000010303e006 CR4: 0000000000370ea0
[  227.559940] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  227.561223] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  227.562566] Call Trace:
[  227.563088]  ib_sa_join_multicast+0x4ec/0x580 [ib_core]
[  227.564059]  ipoib_mcast_join+0x16d/0x220 [ib_ipoib]
[  227.564961]  ? ipoib_mcast_join+0x220/0x220 [ib_ipoib]
[  227.565991]  ipoib_mcast_join_task+0x16e/0x320 [ib_ipoib]
[  227.567014]  ? process_one_work+0x1d7/0x5b0
[  227.567809]  process_one_work+0x25a/0x5b0
[  227.568587]  worker_thread+0x4f/0x3e0
[  227.569349]  ? process_one_work+0x5b0/0x5b0
[  227.570210]  kthread+0x129/0x140
[  227.570874]  ? __kthread_bind_mask+0x60/0x60
[  227.571677]  ret_from_fork+0x1f/0x30
[  227.572416] irq event stamp: 150582
[  227.573190] hardirqs last  enabled at (150581): [<ffffffff81c4b337>] _raw_spin_unlock_irqrestore+0x47/0x50
[  227.575046] hardirqs last disabled at (150582): [<ffffffff81c4b130>] _raw_spin_lock_irqsave+0x50/0x60
[  227.576793] softirqs last  enabled at (150576): [<ffffffffa00bcbef>] ipoib_mcast_join_task+0xef/0x320 [ib_ipoib]
[  227.578657] softirqs last disabled at (150574): [<ffffffffa00bcb9f>] ipoib_mcast_join_task+0x9f/0x320 [ib_ipoib]
[  227.580416] ---[ end trace 52a055538498ac03 ]---

Thanks
Weihang Li May 17, 2021, 7:21 a.m. UTC | #2
On 2021/5/16 18:18, Leon Romanovsky wrote:
> On Fri, May 14, 2021 at 10:11:33AM +0800, Weihang Li wrote:
>> There are some refcnt in type of atomic_t in RDMA subsystem, almost all of
>> them is wrote before the refcount_t is acheived in kernel. refcount_t is
>> better than integer for reference counting, it will WARN on
>> overflow/underflow and avoid use-after-free risks.
>>
>> Weihang Li (6):
>>   RDMA/core: Use refcount_t instead of atomic_t for reference counting
>>   RDMA/hns: Use refcount_t instead of atomic_t for reference counting
>>   RDMA/hns: Use refcount_t APIs for HEM
>>   RDMA/cxgb4: Use refcount_t instead of atomic_t for reference counting
>>   RDMA/i40iw: Use refcount_t instead of atomic_t for reference counting
>>   RDMA/ipoib: Use refcount_t instead of atomic_t for reference counting
> 
> This series causes to the following splat when restarting mlx4 or mlx5 drivers.
> 
> [  227.529930] ------------[ cut here ]------------
> [  227.530966] refcount_t: addition on 0; use-after-free.
> [  227.531890] WARNING: CPU: 1 PID: 579 at lib/refcount.c:25 refcount_warn_saturate+0xb4/0x130
> [  227.533427] Modules linked in: bonding nf_tables ipip tunnel4 geneve ip6_gre ip6_tunnel tunnel6 ip_gre gre mlx5_ib mlx5_core ptp pps_core rdma_ucm ib_uverbs ib_ipoib ib_umad openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core overlay fuse [last unloaded: ib_uverbs]
> [  227.539872] CPU: 1 PID: 579 Comm: kworker/u20:7 Not tainted 5.13.0-rc1_2021_05_14_02_56_31_ #1
> [  227.541458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> [  227.543424] Workqueue: ipoib_wq ipoib_mcast_join_task [ib_ipoib]
> [  227.544497] RIP: 0010:refcount_warn_saturate+0xb4/0x130
> [  227.545515] Code: 5b c3 0f b6 1d 69 19 40 01 80 fb 01 0f 87 60 fd 52 00 83 e3 01 75 91 48 c7 c7 88 9a 57 82 c6 05 4d 19 40 01 01 e8 35 50 4e 00 <0f> 0b 5b c3 0f b6 1d 3f 19 40 01 80 fb 01 0f 87 70 fd 52 00 83 e3
> [  227.548723] RSP: 0018:ffff888133d23cb0 EFLAGS: 00010082
> [  227.549739] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8882f58a78b8
> [  227.551077] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff8882f58a78b0
> [  227.552306] RBP: ffff888133d23d30 R08: ffffffff83b5fd40 R09: 0000000000000001
> [  227.553589] R10: 0000000000000001 R11: 746e756f63666572 R12: ffff88813e00c700
> [  227.554885] R13: ffff8881182a3720 R14: ffff88810d06f938 R15: ffff8881182a3600
> [  227.556104] FS:  0000000000000000(0000) GS:ffff8882f5880000(0000) knlGS:0000000000000000
> [  227.557594] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  227.558719] CR2: 000055fb4a2ee768 CR3: 000000010303e006 CR4: 0000000000370ea0
> [  227.559940] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  227.561223] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  227.562566] Call Trace:
> [  227.563088]  ib_sa_join_multicast+0x4ec/0x580 [ib_core]
> [  227.564059]  ipoib_mcast_join+0x16d/0x220 [ib_ipoib]
> [  227.564961]  ? ipoib_mcast_join+0x220/0x220 [ib_ipoib]
> [  227.565991]  ipoib_mcast_join_task+0x16e/0x320 [ib_ipoib]
> [  227.567014]  ? process_one_work+0x1d7/0x5b0
> [  227.567809]  process_one_work+0x25a/0x5b0
> [  227.568587]  worker_thread+0x4f/0x3e0
> [  227.569349]  ? process_one_work+0x5b0/0x5b0
> [  227.570210]  kthread+0x129/0x140
> [  227.570874]  ? __kthread_bind_mask+0x60/0x60
> [  227.571677]  ret_from_fork+0x1f/0x30
> [  227.572416] irq event stamp: 150582
> [  227.573190] hardirqs last  enabled at (150581): [<ffffffff81c4b337>] _raw_spin_unlock_irqrestore+0x47/0x50
> [  227.575046] hardirqs last disabled at (150582): [<ffffffff81c4b130>] _raw_spin_lock_irqsave+0x50/0x60
> [  227.576793] softirqs last  enabled at (150576): [<ffffffffa00bcbef>] ipoib_mcast_join_task+0xef/0x320 [ib_ipoib]
> [  227.578657] softirqs last disabled at (150574): [<ffffffffa00bcb9f>] ipoib_mcast_join_task+0x9f/0x320 [ib_ipoib]
> [  227.580416] ---[ end trace 52a055538498ac03 ]---
> 
> Thanks
> 

Hi Leon,

Thanks for the test, it seems that the problem is caused by the refcount of
mcast_group. I will reply to this email once I have made progress.

Weihang