Message ID | 1666248232-63751-1-git-send-email-alibuda@linux.alibaba.com (mailing list archive) |
---|---|
Headers | show |
Series | optimize the parallelism of SMC-R connections | expand |
Hi Jan, Sorry for the long delay, The main purpose of v3 is to put optimizes also works on SMC-D, dues to the environment, I can only tests it in SMC-R, so please help us to verify the stability and functional in SMC-D, Thanks a lot. If you have any problems, please let us know. Besides, PATCH bug fixes need to be reordered. After the code review passes and the SMC-D test goes stable, I will adjust it in next serial. On 10/20/22 2:43 PM, D.Wythe wrote: > From: "D.Wythe" <alibuda@linux.alibaba.com> > > This patch set attempts to optimize the parallelism of SMC-R connections, > mainly to reduce unnecessary blocking on locks, and to fix exceptions that > occur after thoses optimization. > > According to Off-CPU graph, SMC worker's off-CPU as that: > > smc_close_passive_work (1.09%) > smcr_buf_unuse (1.08%) > smc_llc_flow_initiate (1.02%) > > smc_listen_work (48.17%) > __mutex_lock.isra.11 (47.96%) > > > An ideal SMC-R connection process should only block on the IO events > of the network, but it's quite clear that the SMC-R connection now is > queued on the lock most of the time. > > The goal of this patchset is to achieve our ideal situation where > network IO events are blocked for the majority of the connection lifetime. > > There are three big locks here: > > 1. smc_client_lgr_pending & smc_server_lgr_pending > > 2. llc_conf_mutex > > 3. rmbs_lock & sndbufs_lock > > And an implementation issue: > > 1. confirm/delete rkey msg can't be sent concurrently while > protocol allows indeed. > > Unfortunately,The above problems together affect the parallelism of > SMC-R connection. If any of them are not solved. our goal cannot > be achieved. > > After this patch set, we can get a quite ideal off-CPU graph as > following: > > smc_close_passive_work (41.58%) > smcr_buf_unuse (41.57%) > smc_llc_do_delete_rkey (41.57%) > > smc_listen_work (39.10%) > smc_clc_wait_msg (13.18%) > tcp_recvmsg_locked (13.18) > smc_listen_find_device (25.87%) > smcr_lgr_reg_rmbs (25.87%) > smc_llc_do_confirm_rkey (25.87%) > > We can see that most of the waiting times are waiting for network IO > events. This also has a certain performance improvement on our > short-lived conenction wrk/nginx benchmark test: > > +--------------+------+------+-------+--------+------+--------+ > |conns/qps |c4 | c8 | c16 | c32 | c64 | c200 | > +--------------+------+------+-------+--------+------+--------+ > |SMC-R before |9.7k | 10k | 10k | 9.9k | 9.1k | 8.9k | > +--------------+------+------+-------+--------+------+--------+ > |SMC-R now |13k | 19k | 18k | 16k | 15k | 12k | > +--------------+------+------+-------+--------+------+--------+ > |TCP |15k | 35k | 51k | 80k | 100k | 162k | > +--------------+------+------+-------+--------+------+--------+ > > The reason why the benefit is not obvious after the number of connections > has increased dues to workqueue. If we try to change workqueue to UNBOUND, > we can obtain at least 4-5 times performance improvement, reach up to half > of TCP. However, this is not an elegant solution, the optimization of it > will be much more complicated. But in any case, we will submit relevant > optimization patches as soon as possible. > > Please note that the premise here is that the lock related problem > must be solved first, otherwise, no matter how we optimize the workqueue, > there won't be much improvement. > > Because there are a lot of related changes to the code, if you have > any questions or suggestions, please let me know. > > Thanks > D. Wythe > > v1 -> v2: > > 1. Fix panic in SMC-D scenario > 2. Fix lnkc related hashfn calculation exception, caused by operator > priority > 3. Only wake up one connection if the lnk is not active > 4. Delete obsolete unlock logic in smc_listen_work() > 5. PATCH format, do Reverse Christmas tree > 6. PATCH format, change all xxx_lnk_xxx function to xxx_link_xxx > 7. PATCH format, add correct fix tag for the patches for fixes. > 8. PATCH format, fix some spelling error > 9. PATCH format, rename slow to do_slow > > v2 -> v3: > > 1. add SMC-D support, remove the concept of link cluster since SMC-D has > no link at all. Replace it by lgr decision maker, who provides suggestions > to SMC-D and SMC-R on whether to create new link group. > > 2. Fix the corruption problem described by PATCH 'fix application > data exception' on SMC-D. > > D. Wythe (10): > net/smc: remove locks smc_client_lgr_pending and > smc_server_lgr_pending > net/smc: fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending > net/smc: allow confirm/delete rkey response deliver multiplex > net/smc: make SMC_LLC_FLOW_RKEY run concurrently > net/smc: llc_conf_mutex refactor, replace it with rw_semaphore > net/smc: use read semaphores to reduce unnecessary blocking in > smc_buf_create() & smcr_buf_unuse() > net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs() > net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore > net/smc: Fix potential panic dues to unprotected > smc_llc_srv_add_link() > net/smc: fix application data exception > > net/smc/af_smc.c | 70 ++++---- > net/smc/smc_core.c | 478 +++++++++++++++++++++++++++++++++++++++++++++++------ > net/smc/smc_core.h | 36 +++- > net/smc/smc_llc.c | 277 ++++++++++++++++++++++--------- > net/smc/smc_llc.h | 6 + > net/smc/smc_wr.c | 10 -- > net/smc/smc_wr.h | 10 ++ > 7 files changed, 712 insertions(+), 175 deletions(-) >
On 20.10.2022 09:00, D. Wythe wrote: > > Hi Jan, > > Sorry for the long delay, The main purpose of v3 is to put optimizes > also works on SMC-D, dues to the environment, > I can only tests it in SMC-R, so please help us to verify the stability > and functional in SMC-D, > Thanks a lot. > > If you have any problems, please let us know. > > Besides, PATCH bug fixes need to be reordered. After the code review > passes and the SMC-D test goes stable, I will adjust it > in next serial. Hi D. Wythe, no problem and thank you. I'm going to test your changes and let you know as soon as I'm done. - Jan > > > On 10/20/22 2:43 PM, D.Wythe wrote: >> From: "D.Wythe" <alibuda@linux.alibaba.com> >> >> This patch set attempts to optimize the parallelism of SMC-R connections, >> mainly to reduce unnecessary blocking on locks, and to fix exceptions >> that >> occur after thoses optimization. >> >> According to Off-CPU graph, SMC worker's off-CPU as that: >> >> smc_close_passive_work (1.09%) >> smcr_buf_unuse (1.08%) >> smc_llc_flow_initiate (1.02%) >> >> smc_listen_work (48.17%) >> __mutex_lock.isra.11 (47.96%) >> >> >> An ideal SMC-R connection process should only block on the IO events >> of the network, but it's quite clear that the SMC-R connection now is >> queued on the lock most of the time. >> >> The goal of this patchset is to achieve our ideal situation where >> network IO events are blocked for the majority of the connection >> lifetime. >> >> There are three big locks here: >> >> 1. smc_client_lgr_pending & smc_server_lgr_pending >> >> 2. llc_conf_mutex >> >> 3. rmbs_lock & sndbufs_lock >> >> And an implementation issue: >> >> 1. confirm/delete rkey msg can't be sent concurrently while >> protocol allows indeed. >> >> Unfortunately,The above problems together affect the parallelism of >> SMC-R connection. If any of them are not solved. our goal cannot >> be achieved. >> >> After this patch set, we can get a quite ideal off-CPU graph as >> following: >> >> smc_close_passive_work (41.58%) >> smcr_buf_unuse (41.57%) >> smc_llc_do_delete_rkey (41.57%) >> >> smc_listen_work (39.10%) >> smc_clc_wait_msg (13.18%) >> tcp_recvmsg_locked (13.18) >> smc_listen_find_device (25.87%) >> smcr_lgr_reg_rmbs (25.87%) >> smc_llc_do_confirm_rkey (25.87%) >> >> We can see that most of the waiting times are waiting for network IO >> events. This also has a certain performance improvement on our >> short-lived conenction wrk/nginx benchmark test: >> >> +--------------+------+------+-------+--------+------+--------+ >> |conns/qps |c4 | c8 | c16 | c32 | c64 | c200 | >> +--------------+------+------+-------+--------+------+--------+ >> |SMC-R before |9.7k | 10k | 10k | 9.9k | 9.1k | 8.9k | >> +--------------+------+------+-------+--------+------+--------+ >> |SMC-R now |13k | 19k | 18k | 16k | 15k | 12k | >> +--------------+------+------+-------+--------+------+--------+ >> |TCP |15k | 35k | 51k | 80k | 100k | 162k | >> +--------------+------+------+-------+--------+------+--------+ >> >> The reason why the benefit is not obvious after the number of connections >> has increased dues to workqueue. If we try to change workqueue to >> UNBOUND, >> we can obtain at least 4-5 times performance improvement, reach up to >> half >> of TCP. However, this is not an elegant solution, the optimization of it >> will be much more complicated. But in any case, we will submit relevant >> optimization patches as soon as possible. >> >> Please note that the premise here is that the lock related problem >> must be solved first, otherwise, no matter how we optimize the workqueue, >> there won't be much improvement. >> >> Because there are a lot of related changes to the code, if you have >> any questions or suggestions, please let me know. >> >> Thanks >> D. Wythe >> >> v1 -> v2: >> >> 1. Fix panic in SMC-D scenario >> 2. Fix lnkc related hashfn calculation exception, caused by operator >> priority >> 3. Only wake up one connection if the lnk is not active >> 4. Delete obsolete unlock logic in smc_listen_work() >> 5. PATCH format, do Reverse Christmas tree >> 6. PATCH format, change all xxx_lnk_xxx function to xxx_link_xxx >> 7. PATCH format, add correct fix tag for the patches for fixes. >> 8. PATCH format, fix some spelling error >> 9. PATCH format, rename slow to do_slow >> >> v2 -> v3: >> >> 1. add SMC-D support, remove the concept of link cluster since SMC-D has >> no link at all. Replace it by lgr decision maker, who provides >> suggestions >> to SMC-D and SMC-R on whether to create new link group. >> >> 2. Fix the corruption problem described by PATCH 'fix application >> data exception' on SMC-D. >> >> D. Wythe (10): >> net/smc: remove locks smc_client_lgr_pending and >> smc_server_lgr_pending >> net/smc: fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending >> net/smc: allow confirm/delete rkey response deliver multiplex >> net/smc: make SMC_LLC_FLOW_RKEY run concurrently >> net/smc: llc_conf_mutex refactor, replace it with rw_semaphore >> net/smc: use read semaphores to reduce unnecessary blocking in >> smc_buf_create() & smcr_buf_unuse() >> net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs() >> net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore >> net/smc: Fix potential panic dues to unprotected >> smc_llc_srv_add_link() >> net/smc: fix application data exception >> >> net/smc/af_smc.c | 70 ++++---- >> net/smc/smc_core.c | 478 >> +++++++++++++++++++++++++++++++++++++++++++++++------ >> net/smc/smc_core.h | 36 +++- >> net/smc/smc_llc.c | 277 ++++++++++++++++++++++--------- >> net/smc/smc_llc.h | 6 + >> net/smc/smc_wr.c | 10 -- >> net/smc/smc_wr.h | 10 ++ >> 7 files changed, 712 insertions(+), 175 deletions(-) >>
On 20/10/2022 09:00, D. Wythe wrote: > > Hi Jan, > > Sorry for the long delay, The main purpose of v3 is to put optimizes > also works on SMC-D, dues to the environment, > I can only tests it in SMC-R, so please help us to verify the stability > and functional in SMC-D, > Thanks a lot. > > If you have any problems, please let us know. > > Besides, PATCH bug fixes need to be reordered. After the code review > passes and the SMC-D test goes stable, I will adjust it > in next serial. > > Hi D. Wythe, thank you again for your submission. I ran the first tests and here are my findings: For SMC-R we are facing problems during unloading of the smc module: vvvvvvvvvv [root@testsys10 ~]# dmesg -C [root@testsys10 ~]# dmesg [root@testsys10 ~]# rmmod ism [root@testsys10 ~]# rmmod smc_diag [root@testsys10 ~]# dmesg [ 51.671365] smc: removing smcd device 1522:00:00.0 [root@testsys10 ~]# rmmod smc [root@testsys10 ~]# dmesg [ 51.671365] smc: removing smcd device 1522:00:00.0 [ 65.378445] NET: Unregistered PF_SMC protocol family [ 65.378463] ------------[ cut here ]------------ [ 65.378465] WARNING: CPU: 0 PID: 1155 at kernel/workqueue.c:3066 __flush_work.isra.0+0x28a/0x298 [ 65.378476] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat mlx5_ib nf_conntrack ib_uverbs nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink mlx5_core smc(-) ib_core vfio_ccw s390_trng mdev vfio_iommu_type1 vfio sch_fq_codel configfs ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common pkey zcrypt rng_core autofs4 [last unloaded: smc_diag] [ 65.378509] CPU: 0 PID: 1155 Comm: rmmod Not tainted 6.1.0-rc1-00035-g9980a965416f #4 [ 65.378514] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0) [ 65.378517] Krnl PSW : 0704c00180000000 00000000f9d5f17e (__flush_work.isra.0+0x28e/0x298) [ 65.378523] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 65.380675] Krnl GPRS: 8000000000000001 0000000000000000 000003ff7fd40270 0000000000000000 [ 65.380683] 0000038000c73d70 000e002100000000 0000000000000000 0000000000000001 [ 65.380686] 0000038000c73d70 0000000000000000 000003ff7fd40270 000003ff7fd40270 [ 65.380688] 000000009b8d2100 000003ffe38f98f8 0000038000c73cd0 0000038000c73c30 [ 65.380697] Krnl Code: 00000000f9d5f172: a7780000 lhi %r7,0 00000000f9d5f176: a7f4ff7b brc 15,00000000f9d5f06c #00000000f9d5f17a: af000000 mc 0,0 >00000000f9d5f17e: a7780000 lhi %r7,0 00000000f9d5f182: a7f4ff75 brc 15,00000000f9d5f06c 00000000f9d5f186: 0707 bcr 0,%r7 00000000f9d5f188: c004005daa34 brcl 0,00000000fa9145f0 00000000f9d5f18e: ebaff0680024 stmg %r10,%r15,104(%r15) [ 65.380773] Call Trace: [ 65.380774] [<00000000f9d5f17e>] __flush_work.isra.0+0x28e/0x298 [ 65.380779] [<00000000f9d61228>] __cancel_work_timer+0x130/0x1c0 [ 65.380782] [<00000000fa46b1b4>] rhashtable_free_and_destroy+0x2c/0x170 [ 65.380787] [<000003ff7fd3a08e>] smc_exit+0x3e/0x1b8 [smc] [ 65.380804] [<00000000f9de946a>] __do_sys_delete_module+0x1a2/0x298 [ 65.380809] [<00000000fa8f85ac>] __do_syscall+0x1d4/0x200 [ 65.380814] [<00000000fa907722>] system_call+0x82/0xb0 [ 65.380817] Last Breaking-Event-Address: [ 65.380818] [<00000000f9d5ef24>] __flush_work.isra.0+0x34/0x298 [ 65.380820] ---[ end trace 0000000000000000 ]--- [ 65.380828] smc: removing ib device mlx5_0 [ 65.380833] smc: removing ib device mlx5_1 ^^^^^^^^^^ For SMC-D it seems like your decisionmaker is causing some troubles (crash). I did not have the time yet to look into it, i still dump you the console log - maybe you're seeing the problem faster then me: vvvvvvvvvv [ 135.528259] smc-tests: test_cs_security started [ 136.397056] illegal operation: 0001 ilc:1 [#1] SMP [ 136.397064] Modules linked in: tcp_diag inet_diag ism mlx5_ib ib_uverbs mlx5_core smc_diag smc ib_core vmur nft_fib_inet nft_fib_ ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defra g_ipv6 nf_defrag_ipv4 ip_set nf_tab [ 136.397093] CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.1.0-rc1-00035-g1c11cab281ca #4 [ 136.397098] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0) [ 136.397100] Workqueue: smc_hs_wq smc_listen_work [smc] [ 136.397123] Krnl PSW : 0704e00180000000 0000000000000002 (0x2) [ 136.397128] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [ 136.397133] Krnl GPRS: 0000000000000001 0000000000000000 00000000a5670600 0000000000000000 [ 136.398410] 0000000000000000 000003ff7feee620 00000000000000c8 0000000000000000 [ 136.398417] 000003ff7feed2b8 00000000a5670600 000003ff7feed168 000003ff7fed1628 [ 136.398420] 0000000080334200 0000000000000001 000003ff7fed3ab0 0000037fffb5fa30 [ 136.398425] Krnl Code:#0000000000000000: 0000 illegal [ 136.398425] >0000000000000002: 0000 illegal [ 136.398425] 0000000000000004: 0000 illegal [ 136.398425] 0000000000000006: 0000 illegal [ 136.398425] 0000000000000008: 0000 illegal [ 136.398425] 000000000000000a: 0000 illegal [ 136.398425] 000000000000000c: 0000 illegal [ 136.398425] 000000000000000e: 0000 illegal [ 136.398465] Call Trace: [ 136.398469] [<0000000000000002>] 0x2 [ 136.398472] ([<00000001790fdbde>] release_sock+0x6e/0xd8) [ 136.398482] [<000003ff7fed746a>] smc_conn_create+0xc2/0x9d8 [smc] 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from CPU 01. 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from CPU 00. [ 136.408436] [<000003ff7fec8206>] smc_find_ism_v2_device_serv+0x186/0x288 [smc] [ 136.408444] [<000003ff7fec8336>] smc_listen_find_device+0x2e/0x370 [smc] [ 136.408452] [<000003ff7fecaa8a>] smc_listen_work+0x2ca/0x580 [smc] [ 136.408459] [<00000001788481e8>] process_one_work+0x200/0x458 [ 136.408466] [<000000017884896e>] worker_thread+0x66/0x480 [ 136.408470] [<0000000178851888>] kthread+0x108/0x110 [ 136.408474] [<00000001787d72cc>] __ret_from_fork+0x3c/0x58 [ 136.408478] [<00000001793ef75a>] ret_from_fork+0xa/0x40 [ 136.408484] Last Breaking-Event-Address: [ 136.408486] [<000003ff7fed3aae>] smc_get_or_create_lgr_decision_maker.constprop.0+0xe6/0x398 [smc] [ 136.408495] Kernel panic - not syncing: Fatal exception in interrupt ^^^^^^^^^^ - Jan
Hi Jan, Sorry for this bug. It's my bad to do not enough code checking, here is the problems: int __init smc_core_init(void) { int i; /* init smc lgr decision maker builder */ for (i = 0; i < SMC_TYPE_D; i++) i < SMC_TYPE_D should change to i <= SMC_TYPE_D, otherwise the SMC-D related map has not init yet. i thinks the two bugs was all caused by it. I has reproduced the first problem and verified that it can be fixed. Please help me to see if the SMC-D problem can be fixed too after this change, thx. By the way, Is there any way to simulate SMC-D dev for testing? All of our problems are caused by poor consideration on SMC-D. In fact, we have some SMC-D related work plans in the future. It seems not a perfect way to bother you every time. Best Wishes. D. Wythe On 10/21/22 7:57 PM, Jan Karcher wrote: > > > On 20/10/2022 09:00, D. Wythe wrote: >> >> Hi Jan, >> >> Sorry for the long delay, The main purpose of v3 is to put optimizes also works on SMC-D, dues to the environment, >> I can only tests it in SMC-R, so please help us to verify the stability and functional in SMC-D, >> Thanks a lot. >> >> If you have any problems, please let us know. >> >> Besides, PATCH bug fixes need to be reordered. After the code review passes and the SMC-D test goes stable, I will adjust it >> in next serial. >> >> > > Hi D. Wythe, > > thank you again for your submission. I ran the first tests and here are my findings: > > For SMC-R we are facing problems during unloading of the smc module: > > vvvvvvvvvv > > [root@testsys10 ~]# dmesg -C > [root@testsys10 ~]# dmesg > [root@testsys10 ~]# rmmod ism > [root@testsys10 ~]# rmmod smc_diag > [root@testsys10 ~]# dmesg > [ 51.671365] smc: removing smcd device 1522:00:00.0 > [root@testsys10 ~]# rmmod smc > [root@testsys10 ~]# dmesg > [ 51.671365] smc: removing smcd device 1522:00:00.0 > [ 65.378445] NET: Unregistered PF_SMC protocol family > [ 65.378463] ------------[ cut here ]------------ > [ 65.378465] WARNING: CPU: 0 PID: 1155 at kernel/workqueue.c:3066 __flush_work.isra.0+0x28a/0x298 > [ 65.378476] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat mlx5_ib nf_conntrack ib_uverbs nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink mlx5_core smc(-) ib_core vfio_ccw s390_trng mdev vfio_iommu_type1 vfio sch_fq_codel configfs ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common pkey zcrypt rng_core autofs4 [last unloaded: smc_diag] > [ 65.378509] CPU: 0 PID: 1155 Comm: rmmod Not tainted 6.1.0-rc1-00035-g9980a965416f #4 > [ 65.378514] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0) > [ 65.378517] Krnl PSW : 0704c00180000000 00000000f9d5f17e (__flush_work.isra.0+0x28e/0x298) > [ 65.378523] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > [ 65.380675] Krnl GPRS: 8000000000000001 0000000000000000 000003ff7fd40270 0000000000000000 > [ 65.380683] 0000038000c73d70 000e002100000000 0000000000000000 0000000000000001 > [ 65.380686] 0000038000c73d70 0000000000000000 000003ff7fd40270 000003ff7fd40270 > [ 65.380688] 000000009b8d2100 000003ffe38f98f8 0000038000c73cd0 0000038000c73c30 > [ 65.380697] Krnl Code: 00000000f9d5f172: a7780000 lhi %r7,0 > 00000000f9d5f176: a7f4ff7b brc 15,00000000f9d5f06c > #00000000f9d5f17a: af000000 mc 0,0 > >00000000f9d5f17e: a7780000 lhi %r7,0 > 00000000f9d5f182: a7f4ff75 brc 15,00000000f9d5f06c > 00000000f9d5f186: 0707 bcr 0,%r7 > 00000000f9d5f188: c004005daa34 brcl 0,00000000fa9145f0 > 00000000f9d5f18e: ebaff0680024 stmg %r10,%r15,104(%r15) > [ 65.380773] Call Trace: > [ 65.380774] [<00000000f9d5f17e>] __flush_work.isra.0+0x28e/0x298 > [ 65.380779] [<00000000f9d61228>] __cancel_work_timer+0x130/0x1c0 > [ 65.380782] [<00000000fa46b1b4>] rhashtable_free_and_destroy+0x2c/0x170 > [ 65.380787] [<000003ff7fd3a08e>] smc_exit+0x3e/0x1b8 [smc] > [ 65.380804] [<00000000f9de946a>] __do_sys_delete_module+0x1a2/0x298 > [ 65.380809] [<00000000fa8f85ac>] __do_syscall+0x1d4/0x200 > [ 65.380814] [<00000000fa907722>] system_call+0x82/0xb0 > [ 65.380817] Last Breaking-Event-Address: > [ 65.380818] [<00000000f9d5ef24>] __flush_work.isra.0+0x34/0x298 > [ 65.380820] ---[ end trace 0000000000000000 ]--- > [ 65.380828] smc: removing ib device mlx5_0 > [ 65.380833] smc: removing ib device mlx5_1 > > ^^^^^^^^^^ > > For SMC-D it seems like your decisionmaker is causing some troubles (crash). I did not have the time yet to look into it, i still dump you the console log - maybe you're seeing the problem faster then me: > > > vvvvvvvvvv > > [ 135.528259] smc-tests: test_cs_security started > [ 136.397056] illegal operation: 0001 ilc:1 [#1] SMP > [ 136.397064] Modules linked in: tcp_diag inet_diag ism mlx5_ib ib_uverbs mlx5_core smc_diag smc ib_core vmur nft_fib_inet nft_fib_ > ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defra > g_ipv6 nf_defrag_ipv4 ip_set nf_tab > [ 136.397093] CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.1.0-rc1-00035-g1c11cab281ca #4 > [ 136.397098] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0) > [ 136.397100] Workqueue: smc_hs_wq smc_listen_work [smc] > [ 136.397123] Krnl PSW : 0704e00180000000 0000000000000002 (0x2) > [ 136.397128] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 > [ 136.397133] Krnl GPRS: 0000000000000001 0000000000000000 00000000a5670600 0000000000000000 > > [ 136.398410] 0000000000000000 000003ff7feee620 00000000000000c8 0000000000000000 > [ 136.398417] 000003ff7feed2b8 00000000a5670600 000003ff7feed168 000003ff7fed1628 > [ 136.398420] 0000000080334200 0000000000000001 000003ff7fed3ab0 0000037fffb5fa30 > [ 136.398425] Krnl Code:#0000000000000000: 0000 illegal > [ 136.398425] >0000000000000002: 0000 illegal > [ 136.398425] 0000000000000004: 0000 illegal > [ 136.398425] 0000000000000006: 0000 illegal > [ 136.398425] 0000000000000008: 0000 illegal > [ 136.398425] 000000000000000a: 0000 illegal > [ 136.398425] 000000000000000c: 0000 illegal > [ 136.398425] 000000000000000e: 0000 illegal > [ 136.398465] Call Trace: > [ 136.398469] [<0000000000000002>] 0x2 > [ 136.398472] ([<00000001790fdbde>] release_sock+0x6e/0xd8) > [ 136.398482] [<000003ff7fed746a>] smc_conn_create+0xc2/0x9d8 [smc] > 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from CPU 01. > 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from CPU 00. > > [ 136.408436] [<000003ff7fec8206>] smc_find_ism_v2_device_serv+0x186/0x288 [smc] > [ 136.408444] [<000003ff7fec8336>] smc_listen_find_device+0x2e/0x370 [smc] > [ 136.408452] [<000003ff7fecaa8a>] smc_listen_work+0x2ca/0x580 [smc] > [ 136.408459] [<00000001788481e8>] process_one_work+0x200/0x458 > [ 136.408466] [<000000017884896e>] worker_thread+0x66/0x480 > [ 136.408470] [<0000000178851888>] kthread+0x108/0x110 > [ 136.408474] [<00000001787d72cc>] __ret_from_fork+0x3c/0x58 > [ 136.408478] [<00000001793ef75a>] ret_from_fork+0xa/0x40 > [ 136.408484] Last Breaking-Event-Address: > [ 136.408486] [<000003ff7fed3aae>] smc_get_or_create_lgr_decision_maker.constprop.0+0xe6/0x398 [smc] > [ 136.408495] Kernel panic - not syncing: Fatal exception in interrupt > > ^^^^^^^^^^ > > - Jan
Hi D. Wythe, I reply with the feedback on your fix to your v4 fix. Regarding your questions: We are aware of this situation and we are currently evaluating how we want to deal with SMC-D in the future because as of right now i can understand your frustration regarding the SMC-D testing. Please give me some time to hit up the right people and collect some information to answer your question. I'll let you know as soon as i have an answer. Thanks - Jan On 21/10/2022 17:57, D. Wythe wrote: > Hi Jan, > > Sorry for this bug. It's my bad to do not enough code checking, here is > the problems: > > int __init smc_core_init(void) > { > int i; > > /* init smc lgr decision maker builder */ > for (i = 0; i < SMC_TYPE_D; i++) > > > i < SMC_TYPE_D should change to i <= SMC_TYPE_D, otherwise the SMC-D > related > map has not init yet. i thinks the two bugs was all caused by it. > > > I has reproduced the first problem and verified that it can be fixed. > Please help me to see if the SMC-D problem can be fixed too after this > change, thx. > > By the way, Is there any way to simulate SMC-D dev for testing? All of > our problems are caused by poor consideration on SMC-D. > In fact, we have some SMC-D related work plans in the future. It seems > not a perfect way to bother you every time. > > > Best Wishes. > D. Wythe > > On 10/21/22 7:57 PM, Jan Karcher wrote: >> >> >> On 20/10/2022 09:00, D. Wythe wrote: >>> >>> Hi Jan, >>> >>> Sorry for the long delay, The main purpose of v3 is to put optimizes >>> also works on SMC-D, dues to the environment, >>> I can only tests it in SMC-R, so please help us to verify the >>> stability and functional in SMC-D, >>> Thanks a lot. >>> >>> If you have any problems, please let us know. >>> >>> Besides, PATCH bug fixes need to be reordered. After the code review >>> passes and the SMC-D test goes stable, I will adjust it >>> in next serial. >>> >>> >> >> Hi D. Wythe, >> >> thank you again for your submission. I ran the first tests and here >> are my findings: >> >> For SMC-R we are facing problems during unloading of the smc module: >> >> vvvvvvvvvv >> >> [root@testsys10 ~]# dmesg -C >> [root@testsys10 ~]# dmesg >> [root@testsys10 ~]# rmmod ism >> [root@testsys10 ~]# rmmod smc_diag >> [root@testsys10 ~]# dmesg >> [ 51.671365] smc: removing smcd device 1522:00:00.0 >> [root@testsys10 ~]# rmmod smc >> [root@testsys10 ~]# dmesg >> [ 51.671365] smc: removing smcd device 1522:00:00.0 >> [ 65.378445] NET: Unregistered PF_SMC protocol family >> [ 65.378463] ------------[ cut here ]------------ >> [ 65.378465] WARNING: CPU: 0 PID: 1155 at kernel/workqueue.c:3066 >> __flush_work.isra.0+0x28a/0x298 >> [ 65.378476] Modules linked in: nft_fib_inet nft_fib_ipv4 >> nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 >> nft_reject nft_ct nft_chain_nat nf_nat mlx5_ib nf_conntrack ib_uverbs >> nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink mlx5_core >> smc(-) ib_core vfio_ccw s390_trng mdev vfio_iommu_type1 vfio >> sch_fq_codel configfs ghash_s390 prng chacha_s390 libchacha aes_s390 >> des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 >> sha1_s390 sha_common pkey zcrypt rng_core autofs4 [last unloaded: >> smc_diag] >> [ 65.378509] CPU: 0 PID: 1155 Comm: rmmod Not tainted >> 6.1.0-rc1-00035-g9980a965416f #4 >> [ 65.378514] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0) >> [ 65.378517] Krnl PSW : 0704c00180000000 00000000f9d5f17e >> (__flush_work.isra.0+0x28e/0x298) >> [ 65.378523] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 >> CC:0 PM:0 RI:0 EA:3 >> [ 65.380675] Krnl GPRS: 8000000000000001 0000000000000000 >> 000003ff7fd40270 0000000000000000 >> [ 65.380683] 0000038000c73d70 000e002100000000 >> 0000000000000000 0000000000000001 >> [ 65.380686] 0000038000c73d70 0000000000000000 >> 000003ff7fd40270 000003ff7fd40270 >> [ 65.380688] 000000009b8d2100 000003ffe38f98f8 >> 0000038000c73cd0 0000038000c73c30 >> [ 65.380697] Krnl Code: 00000000f9d5f172: a7780000 lhi %r7,0 >> 00000000f9d5f176: a7f4ff7b brc >> 15,00000000f9d5f06c >> #00000000f9d5f17a: af000000 >> mc 0,0 >> >00000000f9d5f17e: a7780000 lhi >> %r7,0 >> 00000000f9d5f182: a7f4ff75 brc >> 15,00000000f9d5f06c >> 00000000f9d5f186: 0707 bcr >> 0,%r7 >> 00000000f9d5f188: c004005daa34 brcl >> 0,00000000fa9145f0 >> 00000000f9d5f18e: ebaff0680024 stmg >> %r10,%r15,104(%r15) >> [ 65.380773] Call Trace: >> [ 65.380774] [<00000000f9d5f17e>] __flush_work.isra.0+0x28e/0x298 >> [ 65.380779] [<00000000f9d61228>] __cancel_work_timer+0x130/0x1c0 >> [ 65.380782] [<00000000fa46b1b4>] >> rhashtable_free_and_destroy+0x2c/0x170 >> [ 65.380787] [<000003ff7fd3a08e>] smc_exit+0x3e/0x1b8 [smc] >> [ 65.380804] [<00000000f9de946a>] __do_sys_delete_module+0x1a2/0x298 >> [ 65.380809] [<00000000fa8f85ac>] __do_syscall+0x1d4/0x200 >> [ 65.380814] [<00000000fa907722>] system_call+0x82/0xb0 >> [ 65.380817] Last Breaking-Event-Address: >> [ 65.380818] [<00000000f9d5ef24>] __flush_work.isra.0+0x34/0x298 >> [ 65.380820] ---[ end trace 0000000000000000 ]--- >> [ 65.380828] smc: removing ib device mlx5_0 >> [ 65.380833] smc: removing ib device mlx5_1 >> >> ^^^^^^^^^^ >> >> For SMC-D it seems like your decisionmaker is causing some troubles >> (crash). I did not have the time yet to look into it, i still dump you >> the console log - maybe you're seeing the problem faster then me: >> >> >> vvvvvvvvvv >> >> [ 135.528259] smc-tests: test_cs_security started >> [ 136.397056] illegal operation: 0001 ilc:1 [#1] SMP >> [ 136.397064] Modules linked in: tcp_diag inet_diag ism mlx5_ib >> ib_uverbs mlx5_core smc_diag smc ib_core vmur nft_fib_inet nft_fib_ >> ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 >> nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack >> nf_defra >> g_ipv6 nf_defrag_ipv4 ip_set nf_tab >> [ 136.397093] CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted >> 6.1.0-rc1-00035-g1c11cab281ca #4 >> [ 136.397098] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0) >> [ 136.397100] Workqueue: smc_hs_wq smc_listen_work [smc] >> [ 136.397123] Krnl PSW : 0704e00180000000 0000000000000002 (0x2) >> [ 136.397128] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 >> CC:2 PM:0 RI:0 EA:3 >> [ 136.397133] Krnl GPRS: 0000000000000001 0000000000000000 >> 00000000a5670600 0000000000000000 >> >> [ 136.398410] 0000000000000000 000003ff7feee620 >> 00000000000000c8 0000000000000000 >> [ 136.398417] 000003ff7feed2b8 00000000a5670600 >> 000003ff7feed168 000003ff7fed1628 >> [ 136.398420] 0000000080334200 0000000000000001 >> 000003ff7fed3ab0 0000037fffb5fa30 >> [ 136.398425] Krnl Code:#0000000000000000: 0000 illegal >> [ 136.398425] >0000000000000002: 0000 illegal >> [ 136.398425] 0000000000000004: 0000 illegal >> [ 136.398425] 0000000000000006: 0000 illegal >> [ 136.398425] 0000000000000008: 0000 illegal >> [ 136.398425] 000000000000000a: 0000 illegal >> [ 136.398425] 000000000000000c: 0000 illegal >> [ 136.398425] 000000000000000e: 0000 illegal >> [ 136.398465] Call Trace: >> [ 136.398469] [<0000000000000002>] 0x2 >> [ 136.398472] ([<00000001790fdbde>] release_sock+0x6e/0xd8) >> [ 136.398482] [<000003ff7fed746a>] smc_conn_create+0xc2/0x9d8 [smc] >> 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP >> stop from CPU 01. >> 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP >> stop from CPU 00. >> >> [ 136.408436] [<000003ff7fec8206>] >> smc_find_ism_v2_device_serv+0x186/0x288 [smc] >> [ 136.408444] [<000003ff7fec8336>] smc_listen_find_device+0x2e/0x370 >> [smc] >> [ 136.408452] [<000003ff7fecaa8a>] smc_listen_work+0x2ca/0x580 [smc] >> [ 136.408459] [<00000001788481e8>] process_one_work+0x200/0x458 >> [ 136.408466] [<000000017884896e>] worker_thread+0x66/0x480 >> [ 136.408470] [<0000000178851888>] kthread+0x108/0x110 >> [ 136.408474] [<00000001787d72cc>] __ret_from_fork+0x3c/0x58 >> [ 136.408478] [<00000001793ef75a>] ret_from_fork+0xa/0x40 >> [ 136.408484] Last Breaking-Event-Address: >> [ 136.408486] [<000003ff7fed3aae>] >> smc_get_or_create_lgr_decision_maker.constprop.0+0xe6/0x398 [smc] >> [ 136.408495] Kernel panic - not syncing: Fatal exception in interrupt >> >> ^^^^^^^^^^ >> >> - Jan
On Mon, Oct 24, 2022 at 03:10:54PM +0200, Jan Karcher wrote: > Hi D. Wythe, > > I reply with the feedback on your fix to your v4 fix. > > Regarding your questions: > We are aware of this situation and we are currently evaluating how we want > to deal with SMC-D in the future because as of right now i can understand > your frustration regarding the SMC-D testing. > Please give me some time to hit up the right people and collect some > information to answer your question. I'll let you know as soon as i have an > answer. Hi Jan, We sent a RFC [1] to mock SMC-D device for inter-VM communication. The original purpose is not to test, but for now it could be useful for the people who are going to test without physical devices in the community. This driver basically works but I would improve it for testing. Before that, what do you think about it? And where to put this driver? In kernel with SMC code or merge into separate SMC test cases. I haven't made up my mind yet. [1] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/ Cheers, Tony Lu > > Thanks > - Jan > > On 21/10/2022 17:57, D. Wythe wrote: > > Hi Jan, > > > > Sorry for this bug. It's my bad to do not enough code checking, here is > > the problems: > > > > int __init smc_core_init(void) > > { > > int i; > > > > /* init smc lgr decision maker builder */ > > for (i = 0; i < SMC_TYPE_D; i++) > > > > > > i < SMC_TYPE_D should change to i <= SMC_TYPE_D, otherwise the SMC-D > > related > > map has not init yet. i thinks the two bugs was all caused by it. > > > > > > I has reproduced the first problem and verified that it can be fixed. > > Please help me to see if the SMC-D problem can be fixed too after this > > change, thx. > > > > By the way, Is there any way to simulate SMC-D dev for testing? All of > > our problems are caused by poor consideration on SMC-D. > > In fact, we have some SMC-D related work plans in the future. It seems > > not a perfect way to bother you every time. > > > > > > Best Wishes. > > D. Wythe > > > > On 10/21/22 7:57 PM, Jan Karcher wrote: > > > > > > > > > On 20/10/2022 09:00, D. Wythe wrote: > > > > > > > > Hi Jan, > > > > > > > > Sorry for the long delay, The main purpose of v3 is to put > > > > optimizes also works on SMC-D, dues to the environment, > > > > I can only tests it in SMC-R, so please help us to verify the > > > > stability and functional in SMC-D, > > > > Thanks a lot. > > > > > > > > If you have any problems, please let us know. > > > > > > > > Besides, PATCH bug fixes need to be reordered. After the code > > > > review passes and the SMC-D test goes stable, I will adjust it > > > > in next serial. > > > > > > > > > > > > > > Hi D. Wythe, > > > > > > thank you again for your submission. I ran the first tests and here > > > are my findings: > > > > > > For SMC-R we are facing problems during unloading of the smc module: > > > > > > vvvvvvvvvv > > > > > > [root@testsys10 ~]# dmesg -C > > > [root@testsys10 ~]# dmesg > > > [root@testsys10 ~]# rmmod ism > > > [root@testsys10 ~]# rmmod smc_diag > > > [root@testsys10 ~]# dmesg > > > [ 51.671365] smc: removing smcd device 1522:00:00.0 > > > [root@testsys10 ~]# rmmod smc > > > [root@testsys10 ~]# dmesg > > > [ 51.671365] smc: removing smcd device 1522:00:00.0 > > > [ 65.378445] NET: Unregistered PF_SMC protocol family > > > [ 65.378463] ------------[ cut here ]------------ > > > [ 65.378465] WARNING: CPU: 0 PID: 1155 at kernel/workqueue.c:3066 > > > __flush_work.isra.0+0x28a/0x298 > > > [ 65.378476] Modules linked in: nft_fib_inet nft_fib_ipv4 > > > nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 > > > nft_reject nft_ct nft_chain_nat nf_nat mlx5_ib nf_conntrack > > > ib_uverbs nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink > > > mlx5_core smc(-) ib_core vfio_ccw s390_trng mdev vfio_iommu_type1 > > > vfio sch_fq_codel configfs ghash_s390 prng chacha_s390 libchacha > > > aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 > > > sha256_s390 sha1_s390 sha_common pkey zcrypt rng_core autofs4 [last > > > unloaded: smc_diag] > > > [ 65.378509] CPU: 0 PID: 1155 Comm: rmmod Not tainted > > > 6.1.0-rc1-00035-g9980a965416f #4 > > > [ 65.378514] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0) > > > [ 65.378517] Krnl PSW : 0704c00180000000 00000000f9d5f17e > > > (__flush_work.isra.0+0x28e/0x298) > > > [ 65.378523] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 > > > CC:0 PM:0 RI:0 EA:3 > > > [ 65.380675] Krnl GPRS: 8000000000000001 0000000000000000 > > > 000003ff7fd40270 0000000000000000 > > > [ 65.380683] 0000038000c73d70 000e002100000000 > > > 0000000000000000 0000000000000001 > > > [ 65.380686] 0000038000c73d70 0000000000000000 > > > 000003ff7fd40270 000003ff7fd40270 > > > [ 65.380688] 000000009b8d2100 000003ffe38f98f8 > > > 0000038000c73cd0 0000038000c73c30 > > > [ 65.380697] Krnl Code: 00000000f9d5f172: a7780000 lhi %r7,0 > > > 00000000f9d5f176: a7f4ff7b brc > > > 15,00000000f9d5f06c > > > #00000000f9d5f17a: af000000 > > > mc 0,0 > > > >00000000f9d5f17e: a7780000 lhi > > > %r7,0 > > > 00000000f9d5f182: a7f4ff75 brc > > > 15,00000000f9d5f06c > > > 00000000f9d5f186: 0707 bcr > > > 0,%r7 > > > 00000000f9d5f188: c004005daa34 > > > brcl 0,00000000fa9145f0 > > > 00000000f9d5f18e: ebaff0680024 > > > stmg %r10,%r15,104(%r15) > > > [ 65.380773] Call Trace: > > > [ 65.380774] [<00000000f9d5f17e>] __flush_work.isra.0+0x28e/0x298 > > > [ 65.380779] [<00000000f9d61228>] __cancel_work_timer+0x130/0x1c0 > > > [ 65.380782] [<00000000fa46b1b4>] > > > rhashtable_free_and_destroy+0x2c/0x170 > > > [ 65.380787] [<000003ff7fd3a08e>] smc_exit+0x3e/0x1b8 [smc] > > > [ 65.380804] [<00000000f9de946a>] __do_sys_delete_module+0x1a2/0x298 > > > [ 65.380809] [<00000000fa8f85ac>] __do_syscall+0x1d4/0x200 > > > [ 65.380814] [<00000000fa907722>] system_call+0x82/0xb0 > > > [ 65.380817] Last Breaking-Event-Address: > > > [ 65.380818] [<00000000f9d5ef24>] __flush_work.isra.0+0x34/0x298 > > > [ 65.380820] ---[ end trace 0000000000000000 ]--- > > > [ 65.380828] smc: removing ib device mlx5_0 > > > [ 65.380833] smc: removing ib device mlx5_1 > > > > > > ^^^^^^^^^^ > > > > > > For SMC-D it seems like your decisionmaker is causing some troubles > > > (crash). I did not have the time yet to look into it, i still dump > > > you the console log - maybe you're seeing the problem faster then > > > me: > > > > > > > > > vvvvvvvvvv > > > > > > [ 135.528259] smc-tests: test_cs_security started > > > [ 136.397056] illegal operation: 0001 ilc:1 [#1] SMP > > > [ 136.397064] Modules linked in: tcp_diag inet_diag ism mlx5_ib > > > ib_uverbs mlx5_core smc_diag smc ib_core vmur nft_fib_inet nft_fib_ > > > ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 > > > nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack > > > nf_defra > > > g_ipv6 nf_defrag_ipv4 ip_set nf_tab > > > [ 136.397093] CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted > > > 6.1.0-rc1-00035-g1c11cab281ca #4 > > > [ 136.397098] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0) > > > [ 136.397100] Workqueue: smc_hs_wq smc_listen_work [smc] > > > [ 136.397123] Krnl PSW : 0704e00180000000 0000000000000002 (0x2) > > > [ 136.397128] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 > > > CC:2 PM:0 RI:0 EA:3 > > > [ 136.397133] Krnl GPRS: 0000000000000001 0000000000000000 > > > 00000000a5670600 0000000000000000 > > > > > > [ 136.398410] 0000000000000000 000003ff7feee620 > > > 00000000000000c8 0000000000000000 > > > [ 136.398417] 000003ff7feed2b8 00000000a5670600 > > > 000003ff7feed168 000003ff7fed1628 > > > [ 136.398420] 0000000080334200 0000000000000001 > > > 000003ff7fed3ab0 0000037fffb5fa30 > > > [ 136.398425] Krnl Code:#0000000000000000: 0000 illegal > > > [ 136.398425] >0000000000000002: 0000 illegal > > > [ 136.398425] 0000000000000004: 0000 illegal > > > [ 136.398425] 0000000000000006: 0000 illegal > > > [ 136.398425] 0000000000000008: 0000 illegal > > > [ 136.398425] 000000000000000a: 0000 illegal > > > [ 136.398425] 000000000000000c: 0000 illegal > > > [ 136.398425] 000000000000000e: 0000 illegal > > > [ 136.398465] Call Trace: > > > [ 136.398469] [<0000000000000002>] 0x2 > > > [ 136.398472] ([<00000001790fdbde>] release_sock+0x6e/0xd8) > > > [ 136.398482] [<000003ff7fed746a>] smc_conn_create+0xc2/0x9d8 [smc] > > > 01: HCPGSP2629I The virtual machine is placed in CP mode due to a > > > SIGP stop from CPU 01. > > > 01: HCPGSP2629I The virtual machine is placed in CP mode due to a > > > SIGP stop from CPU 00. > > > > > > [ 136.408436] [<000003ff7fec8206>] > > > smc_find_ism_v2_device_serv+0x186/0x288 [smc] > > > [ 136.408444] [<000003ff7fec8336>] > > > smc_listen_find_device+0x2e/0x370 [smc] > > > [ 136.408452] [<000003ff7fecaa8a>] smc_listen_work+0x2ca/0x580 [smc] > > > [ 136.408459] [<00000001788481e8>] process_one_work+0x200/0x458 > > > [ 136.408466] [<000000017884896e>] worker_thread+0x66/0x480 > > > [ 136.408470] [<0000000178851888>] kthread+0x108/0x110 > > > [ 136.408474] [<00000001787d72cc>] __ret_from_fork+0x3c/0x58 > > > [ 136.408478] [<00000001793ef75a>] ret_from_fork+0xa/0x40 > > > [ 136.408484] Last Breaking-Event-Address: > > > [ 136.408486] [<000003ff7fed3aae>] > > > smc_get_or_create_lgr_decision_maker.constprop.0+0xe6/0x398 [smc] > > > [ 136.408495] Kernel panic - not syncing: Fatal exception in interrupt > > > > > > ^^^^^^^^^^ > > > > > > - Jan
On 25/10/2022 08:13, Tony Lu wrote: > On Mon, Oct 24, 2022 at 03:10:54PM +0200, Jan Karcher wrote: >> Hi D. Wythe, >> >> I reply with the feedback on your fix to your v4 fix. >> >> Regarding your questions: >> We are aware of this situation and we are currently evaluating how we want >> to deal with SMC-D in the future because as of right now i can understand >> your frustration regarding the SMC-D testing. >> Please give me some time to hit up the right people and collect some >> information to answer your question. I'll let you know as soon as i have an >> answer. Hi Tony (and D.), > > Hi Jan, > > We sent a RFC [1] to mock SMC-D device for inter-VM communication. The > original purpose is not to test, but for now it could be useful for the > people who are going to test without physical devices in the community. I'm aware of the RFC and various people in IBM looked over it. As stated in the last mail we are aware that the entanglement between SMC-D and ISM is causing problems for the community. To give you a little insight: In order to improve the code quality and usability for the broader community we are working on placing an API between SMC-D and the ISM device. If this API is complete it will be easier to use different "devices" for SMC-D. One could be your device driver for inter-VM communication (ivshmem). Another one could be a "Dummy-Device" which just implements the required interface which acts as a loopback device. This would work only in a single Linux instance, thus would be the perfect device to test SMC-D logic for the broad community. We would hope that these changes remove the hardware restrictions and that the community picks up the idea and implements devices and improves SMC (including SMC-D and SMC-R) even more in the future! As i said - and also teased by Alexandra in a respond to your RFC - this API feature is currently being developed and in our internal reviews. This would make your idea with the inter-VM communication a lot easier and would provide a clean base to build upon in the future. > > This driver basically works but I would improve it for testing. Before > that, what do you think about it? I think it is a great idea and we should definetly give it a shot! I'm also putting a lot in code quality and future maintainability. The API is a key feature there improving the usability for the community and our work as maintainers. So - for the sake of the future of the SMC code base - I'd like to wait with putting your changes upstream for the API and use your idea to see if fits our (and your) requirements. > > And where to put this driver? In kernel with SMC code or merge into > separate SMC test cases. I haven't made up my mind yet. We are not sure either currently, and have to think about that for a bit. I think your driver could be a classic driver, since it is usable for a real world problem (communication between two VMs on the same host). If we look at the "Dummy-Device" above we see that it does not provide any value beside testing. Feel free to share your ideas on that topic. > > [1] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/ > > Cheers, > Tony Lu A friendly disclaimer: Even tho this API feature is pretty far in the development process it can always be that we decide to drop it, if it does not meet our quality expectations. But of course we'll keep you updated. - Jan
On Wed, Oct 26, 2022 at 03:12:48PM +0200, Jan Karcher wrote: > > > On 25/10/2022 08:13, Tony Lu wrote: > > On Mon, Oct 24, 2022 at 03:10:54PM +0200, Jan Karcher wrote: > > > Hi D. Wythe, > > > > > > I reply with the feedback on your fix to your v4 fix. > > > > > > Regarding your questions: > > > We are aware of this situation and we are currently evaluating how we want > > > to deal with SMC-D in the future because as of right now i can understand > > > your frustration regarding the SMC-D testing. > > > Please give me some time to hit up the right people and collect some > > > information to answer your question. I'll let you know as soon as i have an > > > answer. > > Hi Tony (and D.), > > > > Hi Jan, > > > > We sent a RFC [1] to mock SMC-D device for inter-VM communication. The > > original purpose is not to test, but for now it could be useful for the > > people who are going to test without physical devices in the community. > > I'm aware of the RFC and various people in IBM looked over it. > > As stated in the last mail we are aware that the entanglement between SMC-D > and ISM is causing problems for the community. > To give you a little insight: > > In order to improve the code quality and usability for the broader community > we are working on placing an API between SMC-D and the ISM device. If this > API is complete it will be easier to use different "devices" for SMC-D. One > could be your device driver for inter-VM communication (ivshmem). > Another one could be a "Dummy-Device" which just implements the required > interface which acts as a loopback device. This would work only in a single > Linux instance, thus would be the perfect device to test SMC-D logic for the > broad community. That sounds great :-) It will provide many possibilities. > We would hope that these changes remove the hardware restrictions and that > the community picks up the idea and implements devices and improves SMC > (including SMC-D and SMC-R) even more in the future! > > As i said - and also teased by Alexandra in a respond to your RFC - this API > feature is currently being developed and in our internal reviews. This would > make your idea with the inter-VM communication a lot easier and would > provide a clean base to build upon in the future. Great +1. > > > > > This driver basically works but I would improve it for testing. Before > > that, what do you think about it? > > I think it is a great idea and we should definetly give it a shot! I'm also > putting a lot in code quality and future maintainability. The API is a key > feature there improving the usability for the community and our work as > maintainers. So - for the sake of the future of the SMC code base - I'd like > to wait with putting your changes upstream for the API and use your idea to > see if fits our (and your) requirements. Sure. We are very much looking forward to the new API :-) Maybe we can discuss this API in the mail list, and I'd like to adapt it first. > > > > > And where to put this driver? In kernel with SMC code or merge into > > separate SMC test cases. I haven't made up my mind yet. > > We are not sure either currently, and have to think about that for a bit. I > think your driver could be a classic driver, since it is usable for a real > world problem (communication between two VMs on the same host). If we look > at the "Dummy-Device" above we see that it does not provide any value beside > testing. Feel free to share your ideas on that topic. I agree with this. SMC would provides a common ability that drivers can be introduced as classic driver for every individual device, maybe likes ethernet cards with their drivers. And for dummy devices, I have an idea that provides the same shared-memory ability for same host, just like loopback devices for TCP/IP. SMC-D with loopback devices also shows better performance compared with other protocols. Maybe SMC can cover all the scenes including inter-host (SMC-R), inter-VM (SMC-D) and inter-local-process (SMC-D) communication with the fastest path, and make things easier for user-space. I'd like to share this RFC later when it's ready. > > > > > [1] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/ > > > > Cheers, > > Tony Lu > > A friendly disclaimer: Even tho this API feature is pretty far in the > development process it can always be that we decide to drop it, if it does > not meet our quality expectations. But of course we'll keep you updated. > No problem. If there has something that we can involve, we'd be pleasure to do that. Cheers, Tony Lu > - Jan
From: "D.Wythe" <alibuda@linux.alibaba.com> This patch set attempts to optimize the parallelism of SMC-R connections, mainly to reduce unnecessary blocking on locks, and to fix exceptions that occur after thoses optimization. According to Off-CPU graph, SMC worker's off-CPU as that: smc_close_passive_work (1.09%) smcr_buf_unuse (1.08%) smc_llc_flow_initiate (1.02%) smc_listen_work (48.17%) __mutex_lock.isra.11 (47.96%) An ideal SMC-R connection process should only block on the IO events of the network, but it's quite clear that the SMC-R connection now is queued on the lock most of the time. The goal of this patchset is to achieve our ideal situation where network IO events are blocked for the majority of the connection lifetime. There are three big locks here: 1. smc_client_lgr_pending & smc_server_lgr_pending 2. llc_conf_mutex 3. rmbs_lock & sndbufs_lock And an implementation issue: 1. confirm/delete rkey msg can't be sent concurrently while protocol allows indeed. Unfortunately,The above problems together affect the parallelism of SMC-R connection. If any of them are not solved. our goal cannot be achieved. After this patch set, we can get a quite ideal off-CPU graph as following: smc_close_passive_work (41.58%) smcr_buf_unuse (41.57%) smc_llc_do_delete_rkey (41.57%) smc_listen_work (39.10%) smc_clc_wait_msg (13.18%) tcp_recvmsg_locked (13.18) smc_listen_find_device (25.87%) smcr_lgr_reg_rmbs (25.87%) smc_llc_do_confirm_rkey (25.87%) We can see that most of the waiting times are waiting for network IO events. This also has a certain performance improvement on our short-lived conenction wrk/nginx benchmark test: +--------------+------+------+-------+--------+------+--------+ |conns/qps |c4 | c8 | c16 | c32 | c64 | c200 | +--------------+------+------+-------+--------+------+--------+ |SMC-R before |9.7k | 10k | 10k | 9.9k | 9.1k | 8.9k | +--------------+------+------+-------+--------+------+--------+ |SMC-R now |13k | 19k | 18k | 16k | 15k | 12k | +--------------+------+------+-------+--------+------+--------+ |TCP |15k | 35k | 51k | 80k | 100k | 162k | +--------------+------+------+-------+--------+------+--------+ The reason why the benefit is not obvious after the number of connections has increased dues to workqueue. If we try to change workqueue to UNBOUND, we can obtain at least 4-5 times performance improvement, reach up to half of TCP. However, this is not an elegant solution, the optimization of it will be much more complicated. But in any case, we will submit relevant optimization patches as soon as possible. Please note that the premise here is that the lock related problem must be solved first, otherwise, no matter how we optimize the workqueue, there won't be much improvement. Because there are a lot of related changes to the code, if you have any questions or suggestions, please let me know. Thanks D. Wythe v1 -> v2: 1. Fix panic in SMC-D scenario 2. Fix lnkc related hashfn calculation exception, caused by operator priority 3. Only wake up one connection if the lnk is not active 4. Delete obsolete unlock logic in smc_listen_work() 5. PATCH format, do Reverse Christmas tree 6. PATCH format, change all xxx_lnk_xxx function to xxx_link_xxx 7. PATCH format, add correct fix tag for the patches for fixes. 8. PATCH format, fix some spelling error 9. PATCH format, rename slow to do_slow v2 -> v3: 1. add SMC-D support, remove the concept of link cluster since SMC-D has no link at all. Replace it by lgr decision maker, who provides suggestions to SMC-D and SMC-R on whether to create new link group. 2. Fix the corruption problem described by PATCH 'fix application data exception' on SMC-D. D. Wythe (10): net/smc: remove locks smc_client_lgr_pending and smc_server_lgr_pending net/smc: fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending net/smc: allow confirm/delete rkey response deliver multiplex net/smc: make SMC_LLC_FLOW_RKEY run concurrently net/smc: llc_conf_mutex refactor, replace it with rw_semaphore net/smc: use read semaphores to reduce unnecessary blocking in smc_buf_create() & smcr_buf_unuse() net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs() net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore net/smc: Fix potential panic dues to unprotected smc_llc_srv_add_link() net/smc: fix application data exception net/smc/af_smc.c | 70 ++++---- net/smc/smc_core.c | 478 +++++++++++++++++++++++++++++++++++++++++++++++------ net/smc/smc_core.h | 36 +++- net/smc/smc_llc.c | 277 ++++++++++++++++++++++--------- net/smc/smc_llc.h | 6 + net/smc/smc_wr.c | 10 -- net/smc/smc_wr.h | 10 ++ 7 files changed, 712 insertions(+), 175 deletions(-)