mbox series

[for-rc,v3,0/6] RDMA/rxe: Various bug fixes.

Message ID 20210909204456.7476-1-rpearsonhpe@gmail.com (mailing list archive)
Headers show
Series RDMA/rxe: Various bug fixes. | expand

Message

Bob Pearson Sept. 9, 2021, 8:44 p.m. UTC
This series of patches implements several bug fixes and minor
cleanups of the rxe driver. Specifically these fix a bug exposed
by blktest.

They apply cleanly to both
commit 2169b908894df2ce83e7eb4a399d3224b2635126 (origin/for-rc, for-rc)
commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (HEAD -> for-next,
	origin/wip/jgg-for-next, origin/for-next, origin/HEAD)

These are being resubmitted to for-rc instead of for-next.

The v2 version had a typo which broke clean application to for-next.
Additionally in v3 the order of the patches was changed to make
it a little cleaner.

The first patch is a repeat of an earlier patch after rebasing.
It adds memory barriers to kernel to kernel queues. The logic for this
is the same as an earlier patch that only treated user to kernel queues.
Without this patch kernel to kernel queues are expected to intermittently
fail at low frequency as was seen for the other queues.

The second patch is also a repeat after rebasing. It fixes a multicast
bug.

The third patch cleans up the state and type enums used by MRs.

The fourth patch separates the keys in rxe_mr and ib_mr. This allows
the following sequence seen in the srp driver to work correctly.

	do {
		ib_post_send( IB_WR_LOCAL_INV )
		ib_update_fast_reg_key()
		ib_map_mr_sg()
		ib_post_send( IB_WR_REG_MR )
	} while ( !done )

The fifth patch creates duplicate mapping tables for fast MRs. This
prevents rkeys referencing fast MRs from accessing data from an updated
map after the call to ib_map_mr_sg() call by keeping the new and old
mappings separate and atomically swapping them when a reg mr WR is
executed.

The sixth patch checks the type of MRs which receive local or remote
invalidate operations to prevent invalidating user MRs.

Bob Pearson (6):
  RDMA/rxe: Add memory barriers to kernel queues
  RDMA/rxe: Fix memory allocation while locked
  RDMA/rxe: Cleanup MR status and type enums
  RDMA/rxe: Separate HW and SW l/rkeys
  RDMA/rxe: Create duplicate mapping tables for FMRs
  RDMA/rxe: Only allow invalidate for appropriate MRs

 drivers/infiniband/sw/rxe/rxe_comp.c  |  10 +-
 drivers/infiniband/sw/rxe/rxe_cq.c    |  25 +--
 drivers/infiniband/sw/rxe/rxe_loc.h   |   2 +
 drivers/infiniband/sw/rxe/rxe_mcast.c |   2 +-
 drivers/infiniband/sw/rxe/rxe_mr.c    | 267 +++++++++++++++++++-------
 drivers/infiniband/sw/rxe/rxe_mw.c    |  36 ++--
 drivers/infiniband/sw/rxe/rxe_qp.c    |  10 +-
 drivers/infiniband/sw/rxe/rxe_queue.h |  73 ++-----
 drivers/infiniband/sw/rxe/rxe_req.c   |  35 +---
 drivers/infiniband/sw/rxe/rxe_resp.c  |  38 +---
 drivers/infiniband/sw/rxe/rxe_srq.c   |   2 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c |  92 +++------
 drivers/infiniband/sw/rxe/rxe_verbs.h |  48 ++---
 13 files changed, 305 insertions(+), 335 deletions(-)

Comments

Bart Van Assche Sept. 9, 2021, 9:52 p.m. UTC | #1
On 9/9/21 1:44 PM, Bob Pearson wrote:
> This series of patches implements several bug fixes and minor
> cleanups of the rxe driver. Specifically these fix a bug exposed
> by blktest.
> 
> They apply cleanly to both
> commit 2169b908894df2ce83e7eb4a399d3224b2635126 (origin/for-rc, for-rc)
> commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (HEAD -> for-next,
> 	origin/wip/jgg-for-next, origin/for-next, origin/HEAD)
> 
> These are being resubmitted to for-rc instead of for-next.

Hi Bob,

Thanks for having rebased and reposted this patch series. I have applied
this series on top of commit 2169b908894d ("IB/hfi1: make hist static").
A kernel bug was triggered while running test srp/001. I have attached
the kernel configuration used in my test to this email.

Thanks,

Bart.



ib_srpt Received SRP_LOGIN_REQ with i_port_id fe80:0000:0000:0000:5054:00ff:fe86:7464, t_port_id 5054:00ff:fe86:7464:5054:00ff:fe86:7464 and it_iu_len 8260 on port 1 (guid=fe80:0000:0000:0000:5054:00ff:fe86:7464); pkey 0xffff
BUG: unable to handle page fault for address: ffffc900e357d614
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 100000067 P4D 100000067 PUD 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 26 PID: 148 Comm: ksoftirqd/26 Tainted: G            E     5.14.0-rc6-dbg+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
RIP: 0010:rxe_completer+0x96d/0x1050 [rdma_rxe]
Code: e0 49 8b 44 24 08 44 89 e9 41 d3 e6 4e 8d a4 30 80 01 00 00 4d 85 e4 0f 84 f9 00 00 00 49 8d bc 24 94 00 00 00 e8 73 a8 b1 e0 <41> 8b 84 24 94 00 00 00 85 c0 0f 84 df 00 00 00 83 f8 03 0f 84 bf
RSP: 0018:ffff8881014075f8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88813c67c000 RCX: dffffc0000000000
RDX: 0000000000000007 RSI: ffffffff826920c0 RDI: ffffc900e357d614
RBP: ffff8881014076e8 R08: ffffffffa09b228d R09: ffff88813c67c57b
R10: ffffed10278cf8af R11: 0000000000000000 R12: ffffc900e357d580
R13: 000000000000000a R14: 00000000d9c99400 R15: ffff8881515ddd08
FS:  0000000000000000(0000) GS:ffff88842d100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc900e357d614 CR3: 0000000002e29005 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
  rxe_do_task+0xdd/0x160 [rdma_rxe]
  rxe_run_task+0x67/0x80 [rdma_rxe]
  rxe_comp_queue_pkt+0x75/0x80 [rdma_rxe]
  rxe_rcv+0x345/0x480 [rdma_rxe]
  rxe_xmit_packet+0x1af/0x300 [rdma_rxe]
  send_ack.isra.0+0x88/0xd0 [rdma_rxe]
  rxe_responder+0xf4c/0x15e0 [rdma_rxe]
  rxe_do_task+0xdd/0x160 [rdma_rxe]
  rxe_run_task+0x67/0x80 [rdma_rxe]
  rxe_resp_queue_pkt+0x5a/0x60 [rdma_rxe]
  rxe_rcv+0x370/0x480 [rdma_rxe]
  rxe_xmit_packet+0x1af/0x300 [rdma_rxe]
  rxe_requester+0x4f4/0xe80 [rdma_rxe]
  rxe_do_task+0xdd/0x160 [rdma_rxe]
  tasklet_action_common.constprop.0+0x168/0x1b0
  tasklet_action+0x44/0x60
  __do_softirq+0x1db/0x6ed
  run_ksoftirqd+0x37/0x60
  smpboot_thread_fn+0x302/0x410
  kthread+0x1f6/0x220
  ret_from_fork+0x1f/0x30
Modules linked in: ib_srp(E) scsi_transport_srp(E) target_core_user(E) uio(E) target_core_pscsi(E) target_core_file(E) ib_srpt(E) target_core_iblock(E) target_core_mod(E) ib_umad(E) rdma_ucm(E) ib_iser(E) libiscsi(E) scsi_transport_iscsi(E) rdma_cm(E) iw_cm(E) 
scsi_debug(E) ib_cm(E) rdma_rxe(E) ip6_udp_tunnel(E) udp_tunnel(E) ib_uverbs(E) null_blk(E) ib_core(E) brd(E) af_packet(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) 
nft_chain_nat(E) nf_tables(E) ebtable_nat(E) iTCO_wdt(E) watchdog(E) ebtable_broute(E) intel_rapl_msr(E) intel_pmc_bxt(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) 
iptable_mangle(E) iptable_raw(E) ip_set(E) nfnetlink(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) rfkill(E) iptable_filter(E) ip_tables(E) x_tables(E) bpfilter(E) intel_rapl_common(E)
  iosf_mbi(E) isst_if_common(E) i2c_i801(E) pcspkr(E) i2c_smbus(E) virtio_net(E) lpc_ich(E) virtio_balloon(E) net_failover(E) failover(E) tiny_power_button(E) button(E) fuse(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) aesni_intel(E) 
crypto_simd(E) cryptd(E) sr_mod(E) serio_raw(E) cdrom(E) virtio_gpu(E) virtio_dma_buf(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) cec(E) drm(E) qemu_fw_cfg(E) sg(E) nbd(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) 
scsi_dh_alua(E) virtio_rng(E)
CR2: ffffc900e357d614
---[ end trace 0667a278da47193a ]---
RIP: 0010:rxe_completer+0x96d/0x1050 [rdma_rxe]
Code: e0 49 8b 44 24 08 44 89 e9 41 d3 e6 4e 8d a4 30 80 01 00 00 4d 85 e4 0f 84 f9 00 00 00 49 8d bc 24 94 00 00 00 e8 73 a8 b1 e0 <41> 8b 84 24 94 00 00 00 85 c0 0f 84 df 00 00 00 83 f8 03 0f 84 bf
RSP: 0018:ffff8881014075f8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88813c67c000 RCX: dffffc0000000000
RDX: 0000000000000007 RSI: ffffffff826920c0 RDI: ffffc900e357d614
RBP: ffff8881014076e8 R08: ffffffffa09b228d R09: ffff88813c67c57b
R10: ffffed10278cf8af R11: 0000000000000000 R12: ffffc900e357d580
R13: 000000000000000a R14: 00000000d9c99400 R15: ffff8881515ddd08
FS:  0000000000000000(0000) GS:ffff88842d100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc900e357d614 CR3: 0000000002e29005 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
Rebooting in 90 seconds..
Pearson, Robert B Sept. 10, 2021, 7:38 p.m. UTC | #2
Bart,

I was able to run this test case but it is not failing. On my system it passes in ~1sec.
I have several questions about your system setup.

1. Which rdma-core are you running? Out of box or the github tree?
2. Can you run ib_send_bw? Python test suite in rdma-core?
3. Where did you get the kernel bits? Which git tree? Which branch?

Thanks,

Bob Pearson

-----Original Message-----
From: Bart Van Assche <bvanassche@acm.org> 
Sent: Thursday, September 9, 2021 4:52 PM
To: Bob Pearson <rpearsonhpe@gmail.com>; jgg@nvidia.com; zyjzyj2000@gmail.com; linux-rdma@vger.kernel.org; mie@igel.co.jp
Subject: Re: [PATCH for-rc v3 0/6] RDMA/rxe: Various bug fixes.

On 9/9/21 1:44 PM, Bob Pearson wrote:
> This series of patches implements several bug fixes and minor cleanups 
> of the rxe driver. Specifically these fix a bug exposed by blktest.
> 
> They apply cleanly to both
> commit 2169b908894df2ce83e7eb4a399d3224b2635126 (origin/for-rc, 
> for-rc) commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (HEAD -> for-next,
> 	origin/wip/jgg-for-next, origin/for-next, origin/HEAD)
> 
> These are being resubmitted to for-rc instead of for-next.

Hi Bob,

Thanks for having rebased and reposted this patch series. I have applied this series on top of commit 2169b908894d ("IB/hfi1: make hist static").
A kernel bug was triggered while running test srp/001. I have attached the kernel configuration used in my test to this email.

Thanks,

Bart.



ib_srpt Received SRP_LOGIN_REQ with i_port_id fe80:0000:0000:0000:5054:00ff:fe86:7464, t_port_id 5054:00ff:fe86:7464:5054:00ff:fe86:7464 and it_iu_len 8260 on port 1 (guid=fe80:0000:0000:0000:5054:00ff:fe86:7464); pkey 0xffff
BUG: unable to handle page fault for address: ffffc900e357d614
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page PGD 100000067 P4D 100000067 PUD 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 26 PID: 148 Comm: ksoftirqd/26 Tainted: G            E     5.14.0-rc6-dbg+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
RIP: 0010:rxe_completer+0x96d/0x1050 [rdma_rxe]
Code: e0 49 8b 44 24 08 44 89 e9 41 d3 e6 4e 8d a4 30 80 01 00 00 4d 85 e4 0f 84 f9 00 00 00 49 8d bc 24 94 00 00 00 e8 73 a8 b1 e0 <41> 8b 84 24 94 00 00 00 85 c0 0f 84 df 00 00 00 83 f8 03 0f 84 bf
RSP: 0018:ffff8881014075f8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88813c67c000 RCX: dffffc0000000000
RDX: 0000000000000007 RSI: ffffffff826920c0 RDI: ffffc900e357d614
RBP: ffff8881014076e8 R08: ffffffffa09b228d R09: ffff88813c67c57b
R10: ffffed10278cf8af R11: 0000000000000000 R12: ffffc900e357d580
R13: 000000000000000a R14: 00000000d9c99400 R15: ffff8881515ddd08
FS:  0000000000000000(0000) GS:ffff88842d100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc900e357d614 CR3: 0000000002e29005 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
  rxe_do_task+0xdd/0x160 [rdma_rxe]
  rxe_run_task+0x67/0x80 [rdma_rxe]
  rxe_comp_queue_pkt+0x75/0x80 [rdma_rxe]
  rxe_rcv+0x345/0x480 [rdma_rxe]
  rxe_xmit_packet+0x1af/0x300 [rdma_rxe]
  send_ack.isra.0+0x88/0xd0 [rdma_rxe]
  rxe_responder+0xf4c/0x15e0 [rdma_rxe]
  rxe_do_task+0xdd/0x160 [rdma_rxe]
  rxe_run_task+0x67/0x80 [rdma_rxe]
  rxe_resp_queue_pkt+0x5a/0x60 [rdma_rxe]
  rxe_rcv+0x370/0x480 [rdma_rxe]
  rxe_xmit_packet+0x1af/0x300 [rdma_rxe]
  rxe_requester+0x4f4/0xe80 [rdma_rxe]
  rxe_do_task+0xdd/0x160 [rdma_rxe]
  tasklet_action_common.constprop.0+0x168/0x1b0
  tasklet_action+0x44/0x60
  __do_softirq+0x1db/0x6ed
  run_ksoftirqd+0x37/0x60
  smpboot_thread_fn+0x302/0x410
  kthread+0x1f6/0x220
  ret_from_fork+0x1f/0x30
Modules linked in: ib_srp(E) scsi_transport_srp(E) target_core_user(E) uio(E) target_core_pscsi(E) target_core_file(E) ib_srpt(E) target_core_iblock(E) target_core_mod(E) ib_umad(E) rdma_ucm(E) ib_iser(E) libiscsi(E) scsi_transport_iscsi(E) rdma_cm(E) iw_cm(E)
scsi_debug(E) ib_cm(E) rdma_rxe(E) ip6_udp_tunnel(E) udp_tunnel(E) ib_uverbs(E) null_blk(E) ib_core(E) brd(E) af_packet(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E)
nft_chain_nat(E) nf_tables(E) ebtable_nat(E) iTCO_wdt(E) watchdog(E) ebtable_broute(E) intel_rapl_msr(E) intel_pmc_bxt(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E)
iptable_mangle(E) iptable_raw(E) ip_set(E) nfnetlink(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) rfkill(E) iptable_filter(E) ip_tables(E) x_tables(E) bpfilter(E) intel_rapl_common(E)
  iosf_mbi(E) isst_if_common(E) i2c_i801(E) pcspkr(E) i2c_smbus(E) virtio_net(E) lpc_ich(E) virtio_balloon(E) net_failover(E) failover(E) tiny_power_button(E) button(E) fuse(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) aesni_intel(E)
crypto_simd(E) cryptd(E) sr_mod(E) serio_raw(E) cdrom(E) virtio_gpu(E) virtio_dma_buf(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) cec(E) drm(E) qemu_fw_cfg(E) sg(E) nbd(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E)
scsi_dh_alua(E) virtio_rng(E)
CR2: ffffc900e357d614
---[ end trace 0667a278da47193a ]---
RIP: 0010:rxe_completer+0x96d/0x1050 [rdma_rxe]
Code: e0 49 8b 44 24 08 44 89 e9 41 d3 e6 4e 8d a4 30 80 01 00 00 4d 85 e4 0f 84 f9 00 00 00 49 8d bc 24 94 00 00 00 e8 73 a8 b1 e0 <41> 8b 84 24 94 00 00 00 85 c0 0f 84 df 00 00 00 83 f8 03 0f 84 bf
RSP: 0018:ffff8881014075f8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88813c67c000 RCX: dffffc0000000000
RDX: 0000000000000007 RSI: ffffffff826920c0 RDI: ffffc900e357d614
RBP: ffff8881014076e8 R08: ffffffffa09b228d R09: ffff88813c67c57b
R10: ffffed10278cf8af R11: 0000000000000000 R12: ffffc900e357d580
R13: 000000000000000a R14: 00000000d9c99400 R15: ffff8881515ddd08
FS:  0000000000000000(0000) GS:ffff88842d100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc900e357d614 CR3: 0000000002e29005 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled Rebooting in 90 seconds..
Bart Van Assche Sept. 10, 2021, 8:23 p.m. UTC | #3
On 9/10/21 12:38 PM, Pearson, Robert B wrote:
> 1. Which rdma-core are you running? Out of box or the github tree?

I'm using the rdma-core package included in openSUSE Tumbleweed. blktests
pass with that rdma-core package against older kernel versions so I think
the rdma-core package is fine. The version number of the rdma-core package
I'm using is as follows:
$ rpm -q rdma-core
rdma-core-36.0-1.1.x86_64

The rdma tool comes from the iproute2 package:
$ rpm -qf /sbin/rdma
iproute2-5.13-1.1.x86_64

> 3. Where did you get the kernel bits? Which git tree? Which branch?

Hmm ... wasn't that mentioned in my previous email? I mentioned a commit
SHA and these SHA numbers are unique and unambiguous. Anyway: commit
2169b908894d comes from the for-rc branch of the following git repository:
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git.

Bart.
Bob Pearson Sept. 10, 2021, 9:16 p.m. UTC | #4
On 9/10/21 3:23 PM, Bart Van Assche wrote:
> On 9/10/21 12:38 PM, Pearson, Robert B wrote:
>> 1. Which rdma-core are you running? Out of box or the github tree?
> 
> I'm using the rdma-core package included in openSUSE Tumbleweed. blktests
> pass with that rdma-core package against older kernel versions so I think
> the rdma-core package is fine. The version number of the rdma-core package
> I'm using is as follows:
> $ rpm -q rdma-core
> rdma-core-36.0-1.1.x86_64
> 
> The rdma tool comes from the iproute2 package:
> $ rpm -qf /sbin/rdma
> iproute2-5.13-1.1.x86_64
> 
>> 3. Where did you get the kernel bits? Which git tree? Which branch?
> 
> Hmm ... wasn't that mentioned in my previous email? I mentioned a commit
> SHA and these SHA numbers are unique and unambiguous. Anyway: commit
> 2169b908894d comes from the for-rc branch of the following git repository:
> git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git.
> 
> Bart.
> 
> 

You'd be surprised how much I don't know. I do know the numbers are unique but I
haven't the faintest idea how to decode them into useful strings.

In theory you are correct and rdma-core and kernels are supposed to be forwards and
backwards compatible but that is a goal and sometimes regressions do occur. I can try
to run with that version just to make sure.

There is a problem I have seen where some newer distros do not create the default IPV6
address from the MAC address. They randomize it (Ubuntu does this) and rxe is broken
as a result. I end up having to add a line like 

sudo ip addr add dev enp6s0 fe80::b62e:99ff:fef9:fa2e/64
  (where the MAC address is b4:2e:99:f9:fa:2e) just before the line
sudo rdma link add rxe_1 type rxe netdev enp6s0

But, when this is an issue rxe is really broken and almost nothing works so that may not
be an issue for you.

I will try to recreate your setup and retest.

Thanks,

Bob
Bob Pearson Sept. 10, 2021, 9:47 p.m. UTC | #5
On 9/10/21 3:23 PM, Bart Van Assche wrote:
> On 9/10/21 12:38 PM, Pearson, Robert B wrote:
>> 1. Which rdma-core are you running? Out of box or the github tree?
> 
> I'm using the rdma-core package included in openSUSE Tumbleweed. blktests
> pass with that rdma-core package against older kernel versions so I think
> the rdma-core package is fine. The version number of the rdma-core package
> I'm using is as follows:
> $ rpm -q rdma-core
> rdma-core-36.0-1.1.x86_64
> 
> The rdma tool comes from the iproute2 package:
> $ rpm -qf /sbin/rdma
> iproute2-5.13-1.1.x86_64
> 
>> 3. Where did you get the kernel bits? Which git tree? Which branch?
> 
> Hmm ... wasn't that mentioned in my previous email? I mentioned a commit
> SHA and these SHA numbers are unique and unambiguous. Anyway: commit
> 2169b908894d comes from the for-rc branch of the following git repository:
> git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git.
> 
> Bart.
> 
> 

OK I checked out the kernel with the SHA number above and applied the patch series
and rebuilt and reinstalled the kernel. I checked out v36.0 of rdma-core and rebuilt
that. rdma is version 5.9.0 but I doubt that will have any effect. My startup script
is

    export LD_LIBRARY_PATH=/home/bob/src/rdma-core/build/lib/:/usr/local/lib:/usr/lib



    sudo ip link set dev enp0s3 mtu 8500

    sudo ip addr add dev enp0s3 fe80::0a00:27ff:fe94:8a69/64

    sudo rdma link add rxe0 type rxe netdev enp0s3


I am running on a Virtualbox VM instance of Ubuntu 21.04 with 20 cores and 8GB of RAM.

The test looks like

    sudo ./check -q srp/001

    srp/001 (Create and remove LUNs)                             [passed]

        runtime  1.174s  ...  1.236s

There were no issues. 

Any guesses what else to look at?

Thanks,

Bob
Bob Pearson Sept. 10, 2021, 9:50 p.m. UTC | #6
On 9/10/21 4:47 PM, Bob Pearson wrote:
> On 9/10/21 3:23 PM, Bart Van Assche wrote:
>> On 9/10/21 12:38 PM, Pearson, Robert B wrote:
>>> 1. Which rdma-core are you running? Out of box or the github tree?
>>
>> I'm using the rdma-core package included in openSUSE Tumbleweed. blktests
>> pass with that rdma-core package against older kernel versions so I think
>> the rdma-core package is fine. The version number of the rdma-core package
>> I'm using is as follows:
>> $ rpm -q rdma-core
>> rdma-core-36.0-1.1.x86_64
>>
>> The rdma tool comes from the iproute2 package:
>> $ rpm -qf /sbin/rdma
>> iproute2-5.13-1.1.x86_64
>>
>>> 3. Where did you get the kernel bits? Which git tree? Which branch?
>>
>> Hmm ... wasn't that mentioned in my previous email? I mentioned a commit
>> SHA and these SHA numbers are unique and unambiguous. Anyway: commit
>> 2169b908894d comes from the for-rc branch of the following git repository:
>> git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git.
>>
>> Bart.
>>
>>
> 
> OK I checked out the kernel with the SHA number above and applied the patch series
> and rebuilt and reinstalled the kernel. I checked out v36.0 of rdma-core and rebuilt
> that. rdma is version 5.9.0 but I doubt that will have any effect. My startup script
> is
> 
>     export LD_LIBRARY_PATH=/home/bob/src/rdma-core/build/lib/:/usr/local/lib:/usr/lib
> 
> 
> 
>     sudo ip link set dev enp0s3 mtu 8500
> 
>     sudo ip addr add dev enp0s3 fe80::0a00:27ff:fe94:8a69/64
> 
>     sudo rdma link add rxe0 type rxe netdev enp0s3
> 
> 
> I am running on a Virtualbox VM instance of Ubuntu 21.04 with 20 cores and 8GB of RAM.
> 
> The test looks like
> 
>     sudo ./check -q srp/001
> 
>     srp/001 (Create and remove LUNs)                             [passed]
> 
>         runtime  1.174s  ...  1.236s
> 
> There were no issues. 
> 
> Any guesses what else to look at?
> 
> Thanks,
> 
> Bob
> 

The 8500 is not required. It runs fine with 4K MTU just as well.
Bart Van Assche Sept. 10, 2021, 10:07 p.m. UTC | #7
On 9/10/21 2:47 PM, Bob Pearson wrote:
> OK I checked out the kernel with the SHA number above and applied the patch series
> and rebuilt and reinstalled the kernel. I checked out v36.0 of rdma-core and rebuilt
> that. rdma is version 5.9.0 but I doubt that will have any effect. My startup script
> is
> 
>      export LD_LIBRARY_PATH=/home/bob/src/rdma-core/build/lib/:/usr/local/lib:/usr/lib
> 
> 
> 
>      sudo ip link set dev enp0s3 mtu 8500
> 
>      sudo ip addr add dev enp0s3 fe80::0a00:27ff:fe94:8a69/64
> 
>      sudo rdma link add rxe0 type rxe netdev enp0s3
> 
> 
> I am running on a Virtualbox VM instance of Ubuntu 21.04 with 20 cores and 8GB of RAM.
> 
> The test looks like
> 
>      sudo ./check -q srp/001
> 
>      srp/001 (Create and remove LUNs)                             [passed]
> 
>          runtime  1.174s  ...  1.236s
> 
> There were no issues.
> 
> Any guesses what else to look at?

The test I ran is different. I did not run any of the ip link / ip addr /
rdma link commands since the blktests scripts already run the rdma link
command. The bug I reported in my previous email is reproducible and
triggers a VM halt.

Are we using the same kernel config? I attached my kernel config to my
previous email. The source code location of the crash address is as
follows:

(gdb) list *(rxe_completer+0x96d)
0x228d is in rxe_completer (drivers/infiniband/sw/rxe/rxe_comp.c:149).
144              */
145             wqe = queue_head(qp->sq.queue, QUEUE_TYPE_FROM_CLIENT);
146             *wqe_p = wqe;
147
148             /* no WQE or requester has not started it yet */
149             if (!wqe || wqe->state == wqe_state_posted)
150                     return pkt ? COMPST_DONE : COMPST_EXIT;
151
152             /* WQE does not require an ack */
153             if (wqe->state == wqe_state_done)

The disassembly output is as follows:

drivers/infiniband/sw/rxe/rxe_comp.c:
149             if (!wqe || wqe->state == wqe_state_posted)
    0x0000000000002277 <+2391>:  test   %r12,%r12
    0x000000000000227a <+2394>:  je     0x2379 <rxe_completer+2649>
    0x0000000000002280 <+2400>:  lea    0x94(%r12),%rdi
    0x0000000000002288 <+2408>:  call   0x228d <rxe_completer+2413>
    0x000000000000228d <+2413>:  mov    0x94(%r12),%eax
    0x0000000000002295 <+2421>:  test   %eax,%eax
    0x0000000000002297 <+2423>:  je     0x237c <rxe_completer+2652>

So the instruction that triggers the crash is "mov 0x94(%r12),%eax".
Does consumer_addr() perhaps return an invalid address under certain
circumstances?

Thanks,

Bart.
Bob Pearson Sept. 12, 2021, 2:41 p.m. UTC | #8
On 9/10/21 5:07 PM, Bart Van Assche wrote:
> On 9/10/21 2:47 PM, Bob Pearson wrote:
>> OK I checked out the kernel with the SHA number above and applied the patch series
>> and rebuilt and reinstalled the kernel. I checked out v36.0 of rdma-core and rebuilt
>> that. rdma is version 5.9.0 but I doubt that will have any effect. My startup script
>> is
>>
>>      export LD_LIBRARY_PATH=/home/bob/src/rdma-core/build/lib/:/usr/local/lib:/usr/lib
>>
>>
>>
>>      sudo ip link set dev enp0s3 mtu 8500
>>
>>      sudo ip addr add dev enp0s3 fe80::0a00:27ff:fe94:8a69/64
>>
>>      sudo rdma link add rxe0 type rxe netdev enp0s3
>>
>>
>> I am running on a Virtualbox VM instance of Ubuntu 21.04 with 20 cores and 8GB of RAM.
>>
>> The test looks like
>>
>>      sudo ./check -q srp/001
>>
>>      srp/001 (Create and remove LUNs)                             [passed]
>>
>>          runtime  1.174s  ...  1.236s
>>
>> There were no issues.
>>
>> Any guesses what else to look at?
> 
> The test I ran is different. I did not run any of the ip link / ip addr /
> rdma link commands since the blktests scripts already run the rdma link
> command. The bug I reported in my previous email is reproducible and
> triggers a VM halt.
> 
> Are we using the same kernel config? I attached my kernel config to my
> previous email. The source code location of the crash address is as
> follows:
> 
> (gdb) list *(rxe_completer+0x96d)
> 0x228d is in rxe_completer (drivers/infiniband/sw/rxe/rxe_comp.c:149).
> 144              */
> 145             wqe = queue_head(qp->sq.queue, QUEUE_TYPE_FROM_CLIENT);
> 146             *wqe_p = wqe;
> 147
> 148             /* no WQE or requester has not started it yet */
> 149             if (!wqe || wqe->state == wqe_state_posted)
> 150                     return pkt ? COMPST_DONE : COMPST_EXIT;
> 151
> 152             /* WQE does not require an ack */
> 153             if (wqe->state == wqe_state_done)
> 
> The disassembly output is as follows:
> 
> drivers/infiniband/sw/rxe/rxe_comp.c:
> 149             if (!wqe || wqe->state == wqe_state_posted)
>    0x0000000000002277 <+2391>:  test   %r12,%r12
>    0x000000000000227a <+2394>:  je     0x2379 <rxe_completer+2649>
>    0x0000000000002280 <+2400>:  lea    0x94(%r12),%rdi
>    0x0000000000002288 <+2408>:  call   0x228d <rxe_completer+2413>
>    0x000000000000228d <+2413>:  mov    0x94(%r12),%eax
>    0x0000000000002295 <+2421>:  test   %eax,%eax
>    0x0000000000002297 <+2423>:  je     0x237c <rxe_completer+2652>
> 
> So the instruction that triggers the crash is "mov 0x94(%r12),%eax".
> Does consumer_addr() perhaps return an invalid address under certain
> circumstances?
> 
> Thanks,
> 
> Bart.

The most likely cause of this was fixed by a patch submitted 8/20/2021 by Xiao Yang. It is copied here

From: Xiao Yang <yangx.jy@fujitsu.com>
To: <linux-rdma@vger.kernel.org>
Cc: <aglo@umich.edu>, <rpearsonhpe@gmail.com>, <zyjzyj2000@gmail.com>,
	<jgg@nvidia.com>, <leon@kernel.org>,
	Xiao Yang <yangx.jy@fujitsu.com>
Subject: [PATCH] RDMA/rxe: Zero out index member of struct rxe_queue
Date: Fri, 20 Aug 2021 19:15:09 +0800	[thread overview]
Message-ID: <20210820111509.172500-1-yangx.jy@fujitsu.com> (raw)

1) New index member of struct rxe_queue is introduced but not zeroed
   so the initial value of index may be random.
2) Current index is not masked off to index_mask.
In such case, producer_addr() and consumer_addr() will get an invalid
address by the random index and then accessing the invalid address
triggers the following panic:
"BUG: unable to handle page fault for address: ffff9ae2c07a1414"

Fix the issue by using kzalloc() to zero out index member.

Fixes: 5bcf5a59c41e ("RDMA/rxe: Protext kernel index from user space")
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_queue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_queue.c b/drivers/infiniband/sw/rxe/rxe_queue.c
index 85b812586ed4..72d95398e604 100644
--- a/drivers/infiniband/sw/rxe/rxe_queue.c
+++ b/drivers/infiniband/sw/rxe/rxe_queue.c
@@ -63,7 +63,7 @@ struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe, int *num_elem,
 	if (*num_elem < 0)
 		goto err1;
 
-	q = kmalloc(sizeof(*q), GFP_KERNEL);
+	q = kzalloc(sizeof(*q), GFP_KERNEL);
 	if (!q)
 		goto err1;
Bob Pearson Sept. 12, 2021, 2:42 p.m. UTC | #9
On 9/10/21 5:07 PM, Bart Van Assche wrote:
> On 9/10/21 2:47 PM, Bob Pearson wrote:
>> OK I checked out the kernel with the SHA number above and applied the patch series
>> and rebuilt and reinstalled the kernel. I checked out v36.0 of rdma-core and rebuilt
>> that. rdma is version 5.9.0 but I doubt that will have any effect. My startup script
>> is
>>
>>      export LD_LIBRARY_PATH=/home/bob/src/rdma-core/build/lib/:/usr/local/lib:/usr/lib
>>
>>
>>
>>      sudo ip link set dev enp0s3 mtu 8500
>>
>>      sudo ip addr add dev enp0s3 fe80::0a00:27ff:fe94:8a69/64
>>
>>      sudo rdma link add rxe0 type rxe netdev enp0s3
>>
>>
>> I am running on a Virtualbox VM instance of Ubuntu 21.04 with 20 cores and 8GB of RAM.
>>
>> The test looks like
>>
>>      sudo ./check -q srp/001
>>
>>      srp/001 (Create and remove LUNs)                             [passed]
>>
>>          runtime  1.174s  ...  1.236s
>>
>> There were no issues.
>>
>> Any guesses what else to look at?
> 
> The test I ran is different. I did not run any of the ip link / ip addr /
> rdma link commands since the blktests scripts already run the rdma link
> command. The bug I reported in my previous email is reproducible and
> triggers a VM halt.
> 
> Are we using the same kernel config? I attached my kernel config to my
> previous email. The source code location of the crash address is as
> follows:
> 
> (gdb) list *(rxe_completer+0x96d)
> 0x228d is in rxe_completer (drivers/infiniband/sw/rxe/rxe_comp.c:149).
> 144              */
> 145             wqe = queue_head(qp->sq.queue, QUEUE_TYPE_FROM_CLIENT);
> 146             *wqe_p = wqe;
> 147
> 148             /* no WQE or requester has not started it yet */
> 149             if (!wqe || wqe->state == wqe_state_posted)
> 150                     return pkt ? COMPST_DONE : COMPST_EXIT;
> 151
> 152             /* WQE does not require an ack */
> 153             if (wqe->state == wqe_state_done)
> 
> The disassembly output is as follows:
> 
> drivers/infiniband/sw/rxe/rxe_comp.c:
> 149             if (!wqe || wqe->state == wqe_state_posted)
>    0x0000000000002277 <+2391>:  test   %r12,%r12
>    0x000000000000227a <+2394>:  je     0x2379 <rxe_completer+2649>
>    0x0000000000002280 <+2400>:  lea    0x94(%r12),%rdi
>    0x0000000000002288 <+2408>:  call   0x228d <rxe_completer+2413>
>    0x000000000000228d <+2413>:  mov    0x94(%r12),%eax
>    0x0000000000002295 <+2421>:  test   %eax,%eax
>    0x0000000000002297 <+2423>:  je     0x237c <rxe_completer+2652>
> 
> So the instruction that triggers the crash is "mov 0x94(%r12),%eax".
> Does consumer_addr() perhaps return an invalid address under certain
> circumstances?
> 
> Thanks,
> 
> Bart.

By the way I did rebuild the kernel with your config file. No change. - Bob
Bart Van Assche Sept. 14, 2021, 3:26 a.m. UTC | #10
On 9/12/21 07:41, Bob Pearson wrote:
> Fixes: 5bcf5a59c41e ("RDMA/rxe: Protext kernel index from user space")
> Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
> ---
>   drivers/infiniband/sw/rxe/rxe_queue.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_queue.c b/drivers/infiniband/sw/rxe/rxe_queue.c
> index 85b812586ed4..72d95398e604 100644
> --- a/drivers/infiniband/sw/rxe/rxe_queue.c
> +++ b/drivers/infiniband/sw/rxe/rxe_queue.c
> @@ -63,7 +63,7 @@ struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe, int *num_elem,
>   	if (*num_elem < 0)
>   		goto err1;
>   
> -	q = kmalloc(sizeof(*q), GFP_KERNEL);
> +	q = kzalloc(sizeof(*q), GFP_KERNEL);
>   	if (!q)
>   		goto err1;

Hi Bob,

If I rebase this patch series on top of kernel v5.15-rc1 then the srp 
tests from the blktests suite pass. Kernel v5.15-rc1 includes the above 
patch. Feel free to add the following to this patch series:

Tested-by: Bart Van Assche <bvanassche@acm.org>

Thanks,

Bart.
Bob Pearson Sept. 14, 2021, 4:18 a.m. UTC | #11
On 9/13/21 10:26 PM, Bart Van Assche wrote:
> On 9/12/21 07:41, Bob Pearson wrote:
>> Fixes: 5bcf5a59c41e ("RDMA/rxe: Protext kernel index from user space")
>> Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
>> ---
>>   drivers/infiniband/sw/rxe/rxe_queue.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_queue.c b/drivers/infiniband/sw/rxe/rxe_queue.c
>> index 85b812586ed4..72d95398e604 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_queue.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_queue.c
>> @@ -63,7 +63,7 @@ struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe, int *num_elem,
>>       if (*num_elem < 0)
>>           goto err1;
>>   -    q = kmalloc(sizeof(*q), GFP_KERNEL);
>> +    q = kzalloc(sizeof(*q), GFP_KERNEL);
>>       if (!q)
>>           goto err1;
> 
> Hi Bob,
> 
> If I rebase this patch series on top of kernel v5.15-rc1 then the srp tests from the blktests suite pass. Kernel v5.15-rc1 includes the above patch. Feel free to add the following to this patch series:
> 
> Tested-by: Bart Van Assche <bvanassche@acm.org>
> 
> Thanks,
> 
> Bart.

Sadly, I have been trying to resolve the note from Shaib Rao who was trying to make rping work.
His solution was not correct but it led to a can of worms. The kernel verbs consumer APIs were all
using the same APIs from rxe_queue.h to manipulate the client ends of the queues but that was
totally incorrect. These are written from the POV of the driver and use the private index which
is not supposed to be visible to users of the queues. A whole day later I think I have that one about
fixed. So I will be resubmitting the series again in the morning. Its all just memory barriers so
it may not affect you.

Bob