[net-next,00/10] net/smc: optimize the parallelism of SMC-R connections

Message ID	cover.1660152975.git.alibuda@linux.alibaba.com (mailing list archive)
Headers	show Return-Path: <linux-rdma-owner@kernel.org> From: "D. Wythe" <alibuda@linux.alibaba.com> To: kgraul@linux.ibm.com, wenjia@linux.ibm.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org Subject: [PATCH net-next 00/10] net/smc: optimize the parallelism of SMC-R connections Date: Thu, 11 Aug 2022 01:47:31 +0800 Message-Id: <cover.1660152975.git.alibuda@linux.alibaba.com> Precedence: bulk
Series	net/smc: optimize the parallelism of SMC-R connections \| expand [net-next,00/10] net/smc: optimize the parallelism of SMC-R connections [net-next,01/10] net/smc: remove locks smc_client_lgr_pending and smc_server_lgr_pending [net-next,02/10] net/smc: fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending [net-next,03/10] net/smc: allow confirm/delete rkey response deliver multiplex [net-next,04/10] net/smc: make SMC_LLC_FLOW_RKEY run concurrently [net-next,05/10] net/smc: llc_conf_mutex refactor, replace it with rw_semaphore [net-next,06/10] net/smc: use read semaphores to reduce unnecessary blocking in smc_buf_create() & … [net-next,07/10] net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs() [net-next,08/10] net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore [net-next,09/10] net/smc: fix potential panic dues to unprotected smc_llc_srv_add_link() [net-next,10/10] net/smc: fix application data exception

D. Wythe Aug. 10, 2022, 5:47 p.m. UTC

From: "D. Wythe" <alibuda@linux.alibaba.com>

This patch set attempts to optimize the parallelism of SMC-R connections,
mainly to reduce unnecessary blocking on locks, and to fix exceptions that
occur after thoses optimization.

According to Off-CPU graph, SMC worker's off-CPU as that: 

smc_close_passive_work			(1.09%)
	smcr_buf_unuse			(1.08%)
		smc_llc_flow_initiate	(1.02%)
	
smc_listen_work 			(48.17%)
	__mutex_lock.isra.11 		(47.96%)


An ideal SMC-R connection process should only block on the IO events
of the network, but it's quite clear that the SMC-R connection now is
queued on the lock most of the time.

The goal of this patchset is to achieve our ideal situation where
network IO events are blocked for the majority of the connection lifetime.

There are three big locks here:

1. smc_client_lgr_pending & smc_server_lgr_pending

2. llc_conf_mutex

3. rmbs_lock & sndbufs_lock

And an implementation issue:

1. confirm/delete rkey msg can't be sent concurrently while
protocol allows indeed.

Unfortunately,The above problems together affect the parallelism of
SMC-R connection. If any of them are not solved. our goal cannot
be achieved.

After this patch set, we can get a quite ideal off-CPU graph as
following:

smc_close_passive_work					(41.58%)
	smcr_buf_unuse					(41.57%)
		smc_llc_do_delete_rkey			(41.57%)

smc_listen_work						(39.10%)
	smc_clc_wait_msg				(13.18%)
		tcp_recvmsg_locked			(13.18)
	smc_listen_find_device				(25.87%)
		smcr_lgr_reg_rmbs			(25.87%)
			smc_llc_do_confirm_rkey		(25.87%)

We can see that most of the waiting times are waiting for network IO
events. This also has a certain performance improvement on our
short-lived conenction wrk/nginx benchmark test:

+--------------+------+------+-------+--------+------+--------+
|conns/qps     |c4    | c8   |  c16  |  c32   | c64  |  c200  |
+--------------+------+------+-------+--------+------+--------+
|SMC-R before  |9.7k  | 10k  |  10k  |  9.9k  | 9.1k |  8.9k  | 
+--------------+------+------+-------+--------+------+--------+
|SMC-R now     |13k   | 19k  |  18k  |  16k   | 15k  |  12k   |
+--------------+------+------+-------+--------+------+--------+
|TCP	       |15k   | 35k  |  51k  |  80k   | 100k |  162k  |
+--------------+------+------+-------+--------+------+--------+

The reason why the benefit is not obvious after the number of connections has
increased dues to workqueue. If we try to change workqueue to WQ_UNBOUND, we can
obtain at least 4-5 times performance improvement, can reach up to half of TCP.
However, this is not an elegant solution, the optimization of it will be much
more complicated. But in any case, we will submit relevant optimization
patches as soon as possible.

Please note that the premise here is that the lock related problem
must be solved first, otherwise, no matter how we optimize the workqueue,
there won't be much improvement. 

Because there are a lot of related changes to the code, if you have any questions
or suggestions, please let me know.

Thanks
D. Wythe

D. Wythe (10):
  net/smc: remove locks smc_client_lgr_pending and
    smc_server_lgr_pending
  net/smc: fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending
  net/smc: allow confirm/delete rkey response deliver multiplex
  net/smc: make SMC_LLC_FLOW_RKEY run concurrently
  net/smc: llc_conf_mutex refactor, replace it with rw_semaphore
  net/smc: use read semaphores to reduce unnecessary blocking in
    smc_buf_create() & smcr_buf_unuse()
  net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs()
  net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore
  net/smc: fix potential panic dues to unprotected
    smc_llc_srv_add_link()
  net/smc: fix application data exception

 net/smc/af_smc.c   |  40 +++--
 net/smc/smc_core.c | 447 +++++++++++++++++++++++++++++++++++++++++++++++------
 net/smc/smc_core.h |  76 ++++++++-
 net/smc/smc_llc.c  | 286 +++++++++++++++++++++++++---------
 net/smc/smc_llc.h  |   6 +
 net/smc/smc_wr.c   |  10 --
 net/smc/smc_wr.h   |  10 ++
 7 files changed, 728 insertions(+), 147 deletions(-)

Jakub Kicinski Aug. 11, 2022, 3:28 a.m. UTC | #1

On Thu, 11 Aug 2022 01:47:31 +0800 D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> This patch set attempts to optimize the parallelism of SMC-R connections,
> mainly to reduce unnecessary blocking on locks, and to fix exceptions that
> occur after thoses optimization.

net-next is closed until Monday, please see the FAQ.

Also Al Viro complained about the SMC ULP:

https://lore.kernel.org/all/YutBc9aCQOvPPlWN@ZenIV/

I didn't see any responses, what the situation there?

Tony Lu Aug. 11, 2022, 5:13 a.m. UTC | #2

On Wed, Aug 10, 2022 at 08:28:45PM -0700, Jakub Kicinski wrote:
> On Thu, 11 Aug 2022 01:47:31 +0800 D. Wythe wrote:
> > From: "D. Wythe" <alibuda@linux.alibaba.com>
> > 
> > This patch set attempts to optimize the parallelism of SMC-R connections,
> > mainly to reduce unnecessary blocking on locks, and to fix exceptions that
> > occur after thoses optimization.
> 
> net-next is closed until Monday, please see the FAQ.
> 
> Also Al Viro complained about the SMC ULP:
> 
> https://lore.kernel.org/all/YutBc9aCQOvPPlWN@ZenIV/
> 
> I didn't see any responses, what the situation there?

Sorry for the late reply. I am working on it and will give out the
details as soon as possible.

Tony Lu

Karsten Graul Aug. 11, 2022, 12:31 p.m. UTC | #3

On 10/08/2022 19:47, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> This patch set attempts to optimize the parallelism of SMC-R connections,
> mainly to reduce unnecessary blocking on locks, and to fix exceptions that
> occur after thoses optimization.

This are very interesting changes. Please allow us to review and test on 
the s390 architecture. Thank you for this submission!

Jan Karcher Aug. 16, 2022, 9:35 a.m. UTC | #4

On 10.08.2022 19:47, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> This patch set attempts to optimize the parallelism of SMC-R connections,
> mainly to reduce unnecessary blocking on locks, and to fix exceptions that
> occur after thoses optimization.
>

Thank you again for your submission!
Let me give you a quick update from our side:
We tested your patches on top of the net-next kernel on our s390 
systems. They did crash our systems. After verifying our environment we 
pulled console logs and now we can tell that there is indeed a problem 
with your patches regarding SMC-D. So please do not integrate this 
change as of right now. I'm going to do more in depth reviews of your 
patches but i need some time for them so here is a quick a description 
of the problem:

It is a SMC-D problem, that occurs while building up the connection. In 
smc_conn_create you set struct smc_lnk_cluster *lnkc = NULL. For the 
SMC-R path you do grab the pointer, for SMC-D that never happens. Still 
you are using this refernce for SMC-D => Crash. This problem can be 
reproduced using the SMC-D path. Here is an example console output:

[  779.516382] Unable to handle kernel pointer dereference in virtual 
kernel address space
[  779.516389] Failing address: 0000000000000000 TEID: 0000000000000483
[  779.516391] Fault in home space mode while using kernel ASCE.
[  779.516395] AS:0000000069628007 R3:00000000ffbf0007 
S:00000000ffbef800 P:000000000000003d
[  779.516431] Oops: 0004 ilc:2 [#1] SMP
[  779.516436] Modules linked in: tcp_diag inet_diag ism mlx5_ib 
ib_uverbs mlx5_core smc_diag smc ib_core nft_fib_inet nft_fib_ipv4
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 
nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv
6 nf_defrag_ipv4 ip_set nf_tables n
[  779.516470] CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 
5.19.0-13940-g22a46254655a #3
[  779.516476] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0)

[  779.522738] Workqueue: smc_hs_wq smc_listen_work [smc]
[  779.522755] Krnl PSW : 0704c00180000000 000003ff803da89c 
(smc_conn_create+0x174/0x968 [smc])
[  779.522766]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 
PM:0 RI:0 EA:3
[  779.522770] Krnl GPRS: 0000000000000002 0000000000000000 
0000000000000001 0000000000000000
[  779.522773]            000000008a4128a0 000003ff803f21aa 
000000008e30d640 0000000086d72000
[  779.522776]            0000000086d72000 000000008a412803 
000000008a412800 000000008e30d650
[  779.522779]            0000000080934200 0000000000000000 
000003ff803cb954 00000380002dfa88
[  779.522789] Krnl Code: 000003ff803da88e: e310f0e80024        stg 
%r1,232(%r15)
[  779.522789]            000003ff803da894: a7180000            lhi 
%r1,0
[  779.522789]           #000003ff803da898: 582003ac            l 
%r2,940
[  779.522789]           >000003ff803da89c: ba123020            cs 
%r1,%r2,32(%r3)
[  779.522789]            000003ff803da8a0: ec1603be007e        cij 
%r1,0,6,000003ff803db01c

[  779.522789]            000003ff803da8a6: 4110b002            la 
%r1,2(%r11)
[  779.522789]            000003ff803da8aa: e310f0f00024        stg 
%r1,240(%r15)
[  779.522789]            000003ff803da8b0: e310f0c00004        lg 
%r1,192(%r15)
[  779.522870] Call Trace:
[  779.522873]  [<000003ff803da89c>] smc_conn_create+0x174/0x968 [smc]
[  779.522884]  [<000003ff803cb954>] 
smc_find_ism_v2_device_serv+0x1b4/0x300 [smc]
01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP 
stop from CPU 01.
01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP 
stop from CPU 00.
[  779.522894]  [<000003ff803cbace>] smc_listen_find_device+0x2e/0x370 [smc]

I'm going to send the review for the first patch right away (which is 
the one causing the crash), so far I'm done with it. The others are 
going to follow. Maybe you can look over the problem and come up with a 
solution, otherwise we are going to decide if we want to look into it as 
soon as I'm done with the reviews. Thank you for your patience.

Tony Lu Aug. 16, 2022, 12:40 p.m. UTC | #5

On Tue, Aug 16, 2022 at 11:35:15AM +0200, Jan Karcher wrote:
> 
> 
> On 10.08.2022 19:47, D. Wythe wrote:
> > From: "D. Wythe" <alibuda@linux.alibaba.com>
> > 
> > This patch set attempts to optimize the parallelism of SMC-R connections,
> > mainly to reduce unnecessary blocking on locks, and to fix exceptions that
> > occur after thoses optimization.
> > 
> 
> Thank you again for your submission!
> Let me give you a quick update from our side:
> We tested your patches on top of the net-next kernel on our s390 systems.
> They did crash our systems. After verifying our environment we pulled
> console logs and now we can tell that there is indeed a problem with your
> patches regarding SMC-D. So please do not integrate this change as of right
> now. I'm going to do more in depth reviews of your patches but i need some
> time for them so here is a quick a description of the problem:
> 
> It is a SMC-D problem, that occurs while building up the connection. In
> smc_conn_create you set struct smc_lnk_cluster *lnkc = NULL. For the SMC-R
> path you do grab the pointer, for SMC-D that never happens. Still you are
> using this refernce for SMC-D => Crash. This problem can be reproduced using
> the SMC-D path. Here is an example console output:

Got it.

> 
> [  779.516382] Unable to handle kernel pointer dereference in virtual kernel
> address space
> [  779.516389] Failing address: 0000000000000000 TEID: 0000000000000483
> [  779.516391] Fault in home space mode while using kernel ASCE.
> [  779.516395] AS:0000000069628007 R3:00000000ffbf0007 S:00000000ffbef800
> P:000000000000003d
> [  779.516431] Oops: 0004 ilc:2 [#1] SMP
> [  779.516436] Modules linked in: tcp_diag inet_diag ism mlx5_ib ib_uverbs
> mlx5_core smc_diag smc ib_core nft_fib_inet nft_fib_ipv4
> nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
> nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv
> 6 nf_defrag_ipv4 ip_set nf_tables n
> [  779.516470] CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted
> 5.19.0-13940-g22a46254655a #3
> [  779.516476] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0)
> 
> [  779.522738] Workqueue: smc_hs_wq smc_listen_work [smc]
> [  779.522755] Krnl PSW : 0704c00180000000 000003ff803da89c
> (smc_conn_create+0x174/0x968 [smc])
> [  779.522766]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0
> RI:0 EA:3
> [  779.522770] Krnl GPRS: 0000000000000002 0000000000000000 0000000000000001
> 0000000000000000
> [  779.522773]            000000008a4128a0 000003ff803f21aa 000000008e30d640
> 0000000086d72000
> [  779.522776]            0000000086d72000 000000008a412803 000000008a412800
> 000000008e30d650
> [  779.522779]            0000000080934200 0000000000000000 000003ff803cb954
> 00000380002dfa88
> [  779.522789] Krnl Code: 000003ff803da88e: e310f0e80024        stg
> %r1,232(%r15)
> [  779.522789]            000003ff803da894: a7180000            lhi %r1,0
> [  779.522789]           #000003ff803da898: 582003ac            l %r2,940
> [  779.522789]           >000003ff803da89c: ba123020            cs
> %r1,%r2,32(%r3)
> [  779.522789]            000003ff803da8a0: ec1603be007e        cij
> %r1,0,6,000003ff803db01c
> 
> [  779.522789]            000003ff803da8a6: 4110b002            la
> %r1,2(%r11)
> [  779.522789]            000003ff803da8aa: e310f0f00024        stg
> %r1,240(%r15)
> [  779.522789]            000003ff803da8b0: e310f0c00004        lg
> %r1,192(%r15)
> [  779.522870] Call Trace:
> [  779.522873]  [<000003ff803da89c>] smc_conn_create+0x174/0x968 [smc]
> [  779.522884]  [<000003ff803cb954>] smc_find_ism_v2_device_serv+0x1b4/0x300
> [smc]
> 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop
> from CPU 01.
> 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop
> from CPU 00.
> [  779.522894]  [<000003ff803cbace>] smc_listen_find_device+0x2e/0x370 [smc]
> 
> 
> I'm going to send the review for the first patch right away (which is the
> one causing the crash), so far I'm done with it. The others are going to
> follow. Maybe you can look over the problem and come up with a solution,
> otherwise we are going to decide if we want to look into it as soon as I'm
> done with the reviews. Thank you for your patience.

Thanks for pointing this issue. We will fix this soon in v2.

Tony Lu

D. Wythe Aug. 17, 2022, 4:55 a.m. UTC | #6

On 8/16/22 5:35 PM, Jan Karcher wrote:
> 
> 
> On 10.08.2022 19:47, D. Wythe wrote:
>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>
>> This patch set attempts to optimize the parallelism of SMC-R connections,
>> mainly to reduce unnecessary blocking on locks, and to fix exceptions that
>> occur after thoses optimization.
>>
> 
> Thank you again for your submission!
> Let me give you a quick update from our side:
> We tested your patches on top of the net-next kernel on our s390 systems. They did crash our systems. After verifying our environment we pulled console logs and now we can tell that there is indeed a problem with your patches regarding SMC-D. So please do not integrate this change as of right now. I'm going to do more in depth reviews of your patches but i need some time for them so here is a quick a description of the problem:

Sorry for the late reply, and thanks a lot for your comment.

I'm sorry for the low-level mistake. In the early design, I hoped that lnkc can also work on SMC-D,
but in later tests I found out that we don't have SMC-D environment to test, so I have to canceled this logic.
But dues to the rollback isn't thorough enough, leaving this issues, we are very sorry for that.


> It is a SMC-D problem, that occurs while building up the connection. In smc_conn_create you set struct smc_lnk_cluster *lnkc = NULL. For the SMC-R path you do grab the pointer, for SMC-D that never happens. Still you are using this refernce for SMC-D => Crash. This problem can be reproduced using the SMC-D path. Here is an example console output:
> 
> [  779.516382] Unable to handle kernel pointer dereference in virtual kernel address space
> [  779.516389] Failing address: 0000000000000000 TEID: 0000000000000483
> [  779.516391] Fault in home space mode while using kernel ASCE.
> [  779.516395] AS:0000000069628007 R3:00000000ffbf0007 S:00000000ffbef800 P:000000000000003d
> [  779.516431] Oops: 0004 ilc:2 [#1] SMP
> [  779.516436] Modules linked in: tcp_diag inet_diag ism mlx5_ib ib_uverbs mlx5_core smc_diag smc ib_core nft_fib_inet nft_fib_ipv4
> nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv
> 6 nf_defrag_ipv4 ip_set nf_tables n
> [  779.516470] CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 5.19.0-13940-g22a46254655a #3
> [  779.516476] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0)
> 
> [  779.522738] Workqueue: smc_hs_wq smc_listen_work [smc]
> [  779.522755] Krnl PSW : 0704c00180000000 000003ff803da89c (smc_conn_create+0x174/0x968 [smc])
> [  779.522766]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [  779.522770] Krnl GPRS: 0000000000000002 0000000000000000 0000000000000001 0000000000000000
> [  779.522773]            000000008a4128a0 000003ff803f21aa 000000008e30d640 0000000086d72000
> [  779.522776]            0000000086d72000 000000008a412803 000000008a412800 000000008e30d650
> [  779.522779]            0000000080934200 0000000000000000 000003ff803cb954 00000380002dfa88
> [  779.522789] Krnl Code: 000003ff803da88e: e310f0e80024        stg %r1,232(%r15)
> [  779.522789]            000003ff803da894: a7180000            lhi %r1,0
> [  779.522789]           #000003ff803da898: 582003ac            l %r2,940
> [  779.522789]           >000003ff803da89c: ba123020            cs %r1,%r2,32(%r3)
> [  779.522789]            000003ff803da8a0: ec1603be007e        cij %r1,0,6,000003ff803db01c
> 
> [  779.522789]            000003ff803da8a6: 4110b002            la %r1,2(%r11)
> [  779.522789]            000003ff803da8aa: e310f0f00024        stg %r1,240(%r15)
> [  779.522789]            000003ff803da8b0: e310f0c00004        lg %r1,192(%r15)
> [  779.522870] Call Trace:
> [  779.522873]  [<000003ff803da89c>] smc_conn_create+0x174/0x968 [smc]
> [  779.522884]  [<000003ff803cb954>] smc_find_ism_v2_device_serv+0x1b4/0x300 [smc]
> 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from CPU 01.
> 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from CPU 00.
> [  779.522894]  [<000003ff803cbace>] smc_listen_find_device+0x2e/0x370 [smc]
> 
> 
> I'm going to send the review for the first patch right away (which is the one causing the crash), so far I'm done with it. The others are going to follow. Maybe you can look over the problem and come up with a solution, otherwise we are going to decide if we want to look into it as soon as I'm done with the reviews. Thank you for your patience.

In the next revision, I will add additional judgment to protect the SMC-D environment,
thanks for your comments.

And Looking forward to your other comments, thanks again.

D. Wythe

Jan Karcher Aug. 17, 2022, 4:52 p.m. UTC | #7

On 17.08.2022 06:55, D. Wythe wrote:
> 
> 
> On 8/16/22 5:35 PM, Jan Karcher wrote:
>>
>>
>> On 10.08.2022 19:47, D. Wythe wrote:
>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>
>>> This patch set attempts to optimize the parallelism of SMC-R 
>>> connections,
>>> mainly to reduce unnecessary blocking on locks, and to fix exceptions 
>>> that
>>> occur after thoses optimization.
>>>
>>
>> Thank you again for your submission!
>> Let me give you a quick update from our side:
>> We tested your patches on top of the net-next kernel on our s390 
>> systems. They did crash our systems. After verifying our environment 
>> we pulled console logs and now we can tell that there is indeed a 
>> problem with your patches regarding SMC-D. So please do not integrate 
>> this change as of right now. I'm going to do more in depth reviews of 
>> your patches but i need some time for them so here is a quick a 
>> description of the problem:
> 
> Sorry for the late reply, and thanks a lot for your comment.
> 
> I'm sorry for the low-level mistake. In the early design, I hoped that 
> lnkc can also work on SMC-D,
> but in later tests I found out that we don't have SMC-D environment to 
> test, so I have to canceled this logic.
> But dues to the rollback isn't thorough enough, leaving this issues, we 
> are very sorry for that.
> 

One more comment:
If the only reason why you do not touch SMC-D is that you do not have 
the environment to test it we strongly encourage you to change it anyway.

At some point doing kernel development, especially driver development 
you are going to reach the point where you do not have the environment 
to test it. It is on the maintainers to test those changes and verify 
that nothing is broken.

So please:
If testing is the only reason change SMC-D as well and we are going to 
test it for you verifying if it does work or not.

Thank you
Jan

> 
>> It is a SMC-D problem, that occurs while building up the connection. 
>> In smc_conn_create you set struct smc_lnk_cluster *lnkc = NULL. For 
>> the SMC-R path you do grab the pointer, for SMC-D that never happens. 
>> Still you are using this refernce for SMC-D => Crash. This problem can 
>> be reproduced using the SMC-D path. Here is an example console output:
>>
>> [  779.516382] Unable to handle kernel pointer dereference in virtual 
>> kernel address space
>> [  779.516389] Failing address: 0000000000000000 TEID: 0000000000000483
>> [  779.516391] Fault in home space mode while using kernel ASCE.
>> [  779.516395] AS:0000000069628007 R3:00000000ffbf0007 
>> S:00000000ffbef800 P:000000000000003d
>> [  779.516431] Oops: 0004 ilc:2 [#1] SMP
>> [  779.516436] Modules linked in: tcp_diag inet_diag ism mlx5_ib 
>> ib_uverbs mlx5_core smc_diag smc ib_core nft_fib_inet nft_fib_ipv4
>> nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 
>> nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv
>> 6 nf_defrag_ipv4 ip_set nf_tables n
>> [  779.516470] CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 
>> 5.19.0-13940-g22a46254655a #3
>> [  779.516476] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0)
>>
>> [  779.522738] Workqueue: smc_hs_wq smc_listen_work [smc]
>> [  779.522755] Krnl PSW : 0704c00180000000 000003ff803da89c 
>> (smc_conn_create+0x174/0x968 [smc])
>> [  779.522766]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 
>> CC:0 PM:0 RI:0 EA:3
>> [  779.522770] Krnl GPRS: 0000000000000002 0000000000000000 
>> 0000000000000001 0000000000000000
>> [  779.522773]            000000008a4128a0 000003ff803f21aa 
>> 000000008e30d640 0000000086d72000
>> [  779.522776]            0000000086d72000 000000008a412803 
>> 000000008a412800 000000008e30d650
>> [  779.522779]            0000000080934200 0000000000000000 
>> 000003ff803cb954 00000380002dfa88
>> [  779.522789] Krnl Code: 000003ff803da88e: e310f0e80024        stg 
>> %r1,232(%r15)
>> [  779.522789]            000003ff803da894: a7180000            lhi %r1,0
>> [  779.522789]           #000003ff803da898: 582003ac            l %r2,940
>> [  779.522789]           >000003ff803da89c: ba123020            cs 
>> %r1,%r2,32(%r3)
>> [  779.522789]            000003ff803da8a0: ec1603be007e        cij 
>> %r1,0,6,000003ff803db01c
>>
>> [  779.522789]            000003ff803da8a6: 4110b002            la 
>> %r1,2(%r11)
>> [  779.522789]            000003ff803da8aa: e310f0f00024        stg 
>> %r1,240(%r15)
>> [  779.522789]            000003ff803da8b0: e310f0c00004        lg 
>> %r1,192(%r15)
>> [  779.522870] Call Trace:
>> [  779.522873]  [<000003ff803da89c>] smc_conn_create+0x174/0x968 [smc]
>> [  779.522884]  [<000003ff803cb954>] 
>> smc_find_ism_v2_device_serv+0x1b4/0x300 [smc]
>> 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP 
>> stop from CPU 01.
>> 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP 
>> stop from CPU 00.
>> [  779.522894]  [<000003ff803cbace>] smc_listen_find_device+0x2e/0x370 
>> [smc]
>>
>>
>> I'm going to send the review for the first patch right away (which is 
>> the one causing the crash), so far I'm done with it. The others are 
>> going to follow. Maybe you can look over the problem and come up with 
>> a solution, otherwise we are going to decide if we want to look into 
>> it as soon as I'm done with the reviews. Thank you for your patience.
> 
> In the next revision, I will add additional judgment to protect the 
> SMC-D environment,
> thanks for your comments.
> 
> And Looking forward to your other comments, thanks again.
> 
> D. Wythe
>

D. Wythe Aug. 18, 2022, 1:06 p.m. UTC | #8

On 8/18/22 12:52 AM, Jan Karcher wrote:
> 
> 
> On 17.08.2022 06:55, D. Wythe wrote:
>>
>>
>> On 8/16/22 5:35 PM, Jan Karcher wrote:
>>>
>>>
>>> On 10.08.2022 19:47, D. Wythe wrote:
>>>> From: "D. Wythe" <alibuda@linux.alibaba.com>
>>>>
>>>> This patch set attempts to optimize the parallelism of SMC-R connections,
>>>> mainly to reduce unnecessary blocking on locks, and to fix exceptions that
>>>> occur after thoses optimization.
>>>>
>>>
>>> Thank you again for your submission!
>>> Let me give you a quick update from our side:
>>> We tested your patches on top of the net-next kernel on our s390 systems. They did crash our systems. After verifying our environment we pulled console logs and now we can tell that there is indeed a problem with your patches regarding SMC-D. So please do not integrate this change as of right now. I'm going to do more in depth reviews of your patches but i need some time for them so here is a quick a description of the problem:
>>
>> Sorry for the late reply, and thanks a lot for your comment.
>>
>> I'm sorry for the low-level mistake. In the early design, I hoped that lnkc can also work on SMC-D,
>> but in later tests I found out that we don't have SMC-D environment to test, so I have to canceled this logic.
>> But dues to the rollback isn't thorough enough, leaving this issues, we are very sorry for that.
>>
> 
> One more comment:
> If the only reason why you do not touch SMC-D is that you do not have the environment to test it we strongly encourage you to change it anyway.
> 
> At some point doing kernel development, especially driver development you are going to reach the point where you do not have the environment to test it. It is on the maintainers to test those changes and verify that nothing is broken.
> 
> So please:
> If testing is the only reason change SMC-D as well and we are going to test it for you verifying if it does work or not.
> 
> Thank you
> Jan

Actually, this is not the only reason. The purpose of remove smc_server_lgr_pending & smc_client_lgr_pending
is mainly to solve the problem of excessive lock granularity in SMC-R. In SMC-R those locks protect
a complete CLC message exchange process, including sending and receiving. This results in a large number of
connections having to be queued. But this is not the case with SMC-D. SMC-D releases the lock in advance
before receiving the CLC message, which makes the problem less severe in SMC-D than in SMC-R.

Of course, lnkc can be used for SMC-D, but considering that we have no way to test it,
and it is not the core bottleneck of SMC-D, so we gave up it.

I will fix the panic problem first in the next revison. If you have a strong demand for this feature,
I may commit a separate PATCH to support it, dues to current patch is quite complicated, adding SMC-D support
will exacerbate its complexity, which may affect the other reviewer progress.


Thanks
D. Wythe

>>
>>> It is a SMC-D problem, that occurs while building up the connection. In smc_conn_create you set struct smc_lnk_cluster *lnkc = NULL. For the SMC-R path you do grab the pointer, for SMC-D that never happens. Still you are using this refernce for SMC-D => Crash. This problem can be reproduced using the SMC-D path. Here is an example console output:
>>>
>>> [  779.516382] Unable to handle kernel pointer dereference in virtual kernel address space
>>> [  779.516389] Failing address: 0000000000000000 TEID: 0000000000000483
>>> [  779.516391] Fault in home space mode while using kernel ASCE.
>>> [  779.516395] AS:0000000069628007 R3:00000000ffbf0007 S:00000000ffbef800 P:000000000000003d
>>> [  779.516431] Oops: 0004 ilc:2 [#1] SMP
>>> [  779.516436] Modules linked in: tcp_diag inet_diag ism mlx5_ib ib_uverbs mlx5_core smc_diag smc ib_core nft_fib_inet nft_fib_ipv4
>>> nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv
>>> 6 nf_defrag_ipv4 ip_set nf_tables n
>>> [  779.516470] CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 5.19.0-13940-g22a46254655a #3
>>> [  779.516476] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0)
>>>
>>> [  779.522738] Workqueue: smc_hs_wq smc_listen_work [smc]
>>> [  779.522755] Krnl PSW : 0704c00180000000 000003ff803da89c (smc_conn_create+0x174/0x968 [smc])
>>> [  779.522766]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>>> [  779.522770] Krnl GPRS: 0000000000000002 0000000000000000 0000000000000001 0000000000000000
>>> [  779.522773]            000000008a4128a0 000003ff803f21aa 000000008e30d640 0000000086d72000
>>> [  779.522776]            0000000086d72000 000000008a412803 000000008a412800 000000008e30d650
>>> [  779.522779]            0000000080934200 0000000000000000 000003ff803cb954 00000380002dfa88
>>> [  779.522789] Krnl Code: 000003ff803da88e: e310f0e80024        stg %r1,232(%r15)
>>> [  779.522789]            000003ff803da894: a7180000            lhi %r1,0
>>> [  779.522789]           #000003ff803da898: 582003ac            l %r2,940
>>> [  779.522789]           >000003ff803da89c: ba123020            cs %r1,%r2,32(%r3)
>>> [  779.522789]            000003ff803da8a0: ec1603be007e        cij %r1,0,6,000003ff803db01c
>>>
>>> [  779.522789]            000003ff803da8a6: 4110b002            la %r1,2(%r11)
>>> [  779.522789]            000003ff803da8aa: e310f0f00024        stg %r1,240(%r15)
>>> [  779.522789]            000003ff803da8b0: e310f0c00004        lg %r1,192(%r15)
>>> [  779.522870] Call Trace:
>>> [  779.522873]  [<000003ff803da89c>] smc_conn_create+0x174/0x968 [smc]
>>> [  779.522884]  [<000003ff803cb954>] smc_find_ism_v2_device_serv+0x1b4/0x300 [smc]
>>> 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from CPU 01.
>>> 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from CPU 00.
>>> [  779.522894]  [<000003ff803cbace>] smc_listen_find_device+0x2e/0x370 [smc]
>>>
>>>
>>> I'm going to send the review for the first patch right away (which is the one causing the crash), so far I'm done with it. The others are going to follow. Maybe you can look over the problem and come up with a solution, otherwise we are going to decide if we want to look into it as soon as I'm done with the reviews. Thank you for your patience.
>>
>> In the next revision, I will add additional judgment to protect the SMC-D environment,
>> thanks for your comments.
>>
>> And Looking forward to your other comments, thanks again.
>>
>> D. Wythe
>>

[net-next,00/10] net/smc: optimize the parallelism of SMC-R connections

Message

Comments