diff mbox series

[net] net/smc: fix application data exception

Message ID 1669450950-27681-1-git-send-email-alibuda@linux.alibaba.com (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series [net] net/smc: fix application data exception | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers fail 2 blamed authors not CCed: guwen@linux.alibaba.com tonylu@linux.alibaba.com; 4 maintainers not CCed: guwen@linux.alibaba.com pabeni@redhat.com tonylu@linux.alibaba.com edumazet@google.com
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns WARNING: line length of 92 exceeds 80 columns WARNING: line length of 94 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

D. Wythe Nov. 26, 2022, 8:22 a.m. UTC
From: "D. Wythe" <alibuda@linux.alibaba.com>

There is a certain probability that following
exceptions will occur in the wrk benchmark test:

Running 10s test @ http://11.213.45.6:80
  8 threads and 64 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.72ms   13.94ms 245.33ms   94.17%
    Req/Sec     1.96k   713.67     5.41k    75.16%
  155262 requests in 10.10s, 23.10MB read
Non-2xx or 3xx responses: 3

We will find that the error is HTTP 400 error, which is a serious
exception in our test, which means the application data was
corrupted.

Consider the following scenarios:

CPU0                            CPU1

buf_desc->used = 0;
                                cmpxchg(buf_desc->used, 0, 1)
                                deal_with(buf_desc)

memset(buf_desc->cpu_addr,0);

This will cause the data received by a victim connection to be cleared,
thus triggering an HTTP 400 error in the server.

This patch exchange the order between clear used and memset, add
barrier to ensure memory consistency.

Fixes: 1c5526968e27 ("net/smc: Clear memory when release and reuse buffer")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
---
 net/smc/smc_core.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

Comments

D. Wythe Feb. 14, 2023, 12:14 p.m. UTC | #1
Hi, wenjia

This patch of bugfix seems to have been hanging for a long time.
If you have any concerns, please let us know.

Best wishes.
D. Wythe


On 11/26/22 4:22 PM, D.Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> There is a certain probability that following
> exceptions will occur in the wrk benchmark test:
> 
> Running 10s test @ http://11.213.45.6:80
>    8 threads and 64 connections
>    Thread Stats   Avg      Stdev     Max   +/- Stdev
>      Latency     3.72ms   13.94ms 245.33ms   94.17%
>      Req/Sec     1.96k   713.67     5.41k    75.16%
>    155262 requests in 10.10s, 23.10MB read
> Non-2xx or 3xx responses: 3
> 
> We will find that the error is HTTP 400 error, which is a serious
> exception in our test, which means the application data was
> corrupted.
> 
> Consider the following scenarios:
> 
> CPU0                            CPU1
> 
> buf_desc->used = 0;
>                                  cmpxchg(buf_desc->used, 0, 1)
>                                  deal_with(buf_desc)
> 
> memset(buf_desc->cpu_addr,0);
> 
> This will cause the data received by a victim connection to be cleared,
> thus triggering an HTTP 400 error in the server.
> 
> This patch exchange the order between clear used and memset, add
> barrier to ensure memory consistency.
> 
> Fixes: 1c5526968e27 ("net/smc: Clear memory when release and reuse buffer")
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
>   net/smc/smc_core.c | 17 ++++++++---------
>   1 file changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> index c305d8d..c19d4b7 100644
> --- a/net/smc/smc_core.c
> +++ b/net/smc/smc_core.c
> @@ -1120,8 +1120,9 @@ static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb,
>   
>   		smc_buf_free(lgr, is_rmb, buf_desc);
>   	} else {
> -		buf_desc->used = 0;
> -		memset(buf_desc->cpu_addr, 0, buf_desc->len);
> +		/* memzero_explicit provides potential memory barrier semantics */
> +		memzero_explicit(buf_desc->cpu_addr, buf_desc->len);
> +		WRITE_ONCE(buf_desc->used, 0);
>   	}
>   }
>   
> @@ -1132,19 +1133,17 @@ static void smc_buf_unuse(struct smc_connection *conn,
>   		if (!lgr->is_smcd && conn->sndbuf_desc->is_vm) {
>   			smcr_buf_unuse(conn->sndbuf_desc, false, lgr);
>   		} else {
> -			conn->sndbuf_desc->used = 0;
> -			memset(conn->sndbuf_desc->cpu_addr, 0,
> -			       conn->sndbuf_desc->len);
> +			memzero_explicit(conn->sndbuf_desc->cpu_addr, conn->sndbuf_desc->len);
> +			WRITE_ONCE(conn->sndbuf_desc->used, 0);
>   		}
>   	}
>   	if (conn->rmb_desc) {
>   		if (!lgr->is_smcd) {
>   			smcr_buf_unuse(conn->rmb_desc, true, lgr);
>   		} else {
> -			conn->rmb_desc->used = 0;
> -			memset(conn->rmb_desc->cpu_addr, 0,
> -			       conn->rmb_desc->len +
> -			       sizeof(struct smcd_cdc_msg));
> +			memzero_explicit(conn->rmb_desc->cpu_addr,
> +					 conn->rmb_desc->len + sizeof(struct smcd_cdc_msg));
> +			WRITE_ONCE(conn->rmb_desc->used, 0);
>   		}
>   	}
>   }
Wenjia Zhang Feb. 15, 2023, 9:27 a.m. UTC | #2
On 26.11.22 09:22, D.Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> There is a certain probability that following
> exceptions will occur in the wrk benchmark test:
> 
> Running 10s test @ http://11.213.45.6:80
>    8 threads and 64 connections
>    Thread Stats   Avg      Stdev     Max   +/- Stdev
>      Latency     3.72ms   13.94ms 245.33ms   94.17%
>      Req/Sec     1.96k   713.67     5.41k    75.16%
>    155262 requests in 10.10s, 23.10MB read
> Non-2xx or 3xx responses: 3
> 
> We will find that the error is HTTP 400 error, which is a serious
> exception in our test, which means the application data was
> corrupted.
> 
> Consider the following scenarios:
> 
> CPU0                            CPU1
> 
> buf_desc->used = 0;
>                                  cmpxchg(buf_desc->used, 0, 1)
>                                  deal_with(buf_desc)
> 
> memset(buf_desc->cpu_addr,0);
> 
> This will cause the data received by a victim connection to be cleared,
> thus triggering an HTTP 400 error in the server.
> 
> This patch exchange the order between clear used and memset, add
> barrier to ensure memory consistency.
> 
> Fixes: 1c5526968e27 ("net/smc: Clear memory when release and reuse buffer")
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
>   net/smc/smc_core.c | 17 ++++++++---------
>   1 file changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> index c305d8d..c19d4b7 100644
> --- a/net/smc/smc_core.c
> +++ b/net/smc/smc_core.c
> @@ -1120,8 +1120,9 @@ static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb,
>   
>   		smc_buf_free(lgr, is_rmb, buf_desc);
>   	} else {
> -		buf_desc->used = 0;
> -		memset(buf_desc->cpu_addr, 0, buf_desc->len);
> +		/* memzero_explicit provides potential memory barrier semantics */
> +		memzero_explicit(buf_desc->cpu_addr, buf_desc->len);
> +		WRITE_ONCE(buf_desc->used, 0);
>   	}
>   }
>   
> @@ -1132,19 +1133,17 @@ static void smc_buf_unuse(struct smc_connection *conn,
>   		if (!lgr->is_smcd && conn->sndbuf_desc->is_vm) {
>   			smcr_buf_unuse(conn->sndbuf_desc, false, lgr);
>   		} else {
> -			conn->sndbuf_desc->used = 0;
> -			memset(conn->sndbuf_desc->cpu_addr, 0,
> -			       conn->sndbuf_desc->len);
> +			memzero_explicit(conn->sndbuf_desc->cpu_addr, conn->sndbuf_desc->len);
> +			WRITE_ONCE(conn->sndbuf_desc->used, 0);
>   		}
>   	}
>   	if (conn->rmb_desc) {
>   		if (!lgr->is_smcd) {
>   			smcr_buf_unuse(conn->rmb_desc, true, lgr);
>   		} else {
> -			conn->rmb_desc->used = 0;
> -			memset(conn->rmb_desc->cpu_addr, 0,
> -			       conn->rmb_desc->len +
> -			       sizeof(struct smcd_cdc_msg));
> +			memzero_explicit(conn->rmb_desc->cpu_addr,
> +					 conn->rmb_desc->len + sizeof(struct smcd_cdc_msg));
> +			WRITE_ONCE(conn->rmb_desc->used, 0);
>   		}
>   	}
>   }

Hi David,

Thank you for remembering me again about this patch. I did forget to 
answer you, sorry!

My consideration was if memzero_explicit() is necessary in this case. 
But sure, it makes sense, especiall when the dereferencing is in 
somewhere else.

Thank you for the fix!

Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Jakub Kicinski Feb. 15, 2023, 5:31 p.m. UTC | #3
On Wed, 15 Feb 2023 10:27:55 +0100 Wenjia Zhang wrote:
> Hi David,
> 
> Thank you for remembering me again about this patch. I did forget to 
> answer you, sorry!
> 
> My consideration was if memzero_explicit() is necessary in this case. 
> But sure, it makes sense, especiall when the dereferencing is in 
> somewhere else.
> 
> Thank you for the fix!
> 
> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>

Thanks! David, please repost if you'd like the patch to be applied to
the networking tree. The original posting is too old to use.
D. Wythe Feb. 16, 2023, 4:14 a.m. UTC | #4
On 2/16/23 1:31 AM, Jakub Kicinski wrote:
> On Wed, 15 Feb 2023 10:27:55 +0100 Wenjia Zhang wrote:
>> Hi David,
>>
>> Thank you for remembering me again about this patch. I did forget to
>> answer you, sorry!
>>
>> My consideration was if memzero_explicit() is necessary in this case.
>> But sure, it makes sense, especiall when the dereferencing is in
>> somewhere else.
>>
>> Thank you for the fix!
>>
>> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
> 
> Thanks! David, please repost if you'd like the patch to be applied to
> the networking tree. The original posting is too old to use.

Thank you for your reminder.  I will repost it after rebasing.
D. Wythe
diff mbox series

Patch

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index c305d8d..c19d4b7 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -1120,8 +1120,9 @@  static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb,
 
 		smc_buf_free(lgr, is_rmb, buf_desc);
 	} else {
-		buf_desc->used = 0;
-		memset(buf_desc->cpu_addr, 0, buf_desc->len);
+		/* memzero_explicit provides potential memory barrier semantics */
+		memzero_explicit(buf_desc->cpu_addr, buf_desc->len);
+		WRITE_ONCE(buf_desc->used, 0);
 	}
 }
 
@@ -1132,19 +1133,17 @@  static void smc_buf_unuse(struct smc_connection *conn,
 		if (!lgr->is_smcd && conn->sndbuf_desc->is_vm) {
 			smcr_buf_unuse(conn->sndbuf_desc, false, lgr);
 		} else {
-			conn->sndbuf_desc->used = 0;
-			memset(conn->sndbuf_desc->cpu_addr, 0,
-			       conn->sndbuf_desc->len);
+			memzero_explicit(conn->sndbuf_desc->cpu_addr, conn->sndbuf_desc->len);
+			WRITE_ONCE(conn->sndbuf_desc->used, 0);
 		}
 	}
 	if (conn->rmb_desc) {
 		if (!lgr->is_smcd) {
 			smcr_buf_unuse(conn->rmb_desc, true, lgr);
 		} else {
-			conn->rmb_desc->used = 0;
-			memset(conn->rmb_desc->cpu_addr, 0,
-			       conn->rmb_desc->len +
-			       sizeof(struct smcd_cdc_msg));
+			memzero_explicit(conn->rmb_desc->cpu_addr,
+					 conn->rmb_desc->len + sizeof(struct smcd_cdc_msg));
+			WRITE_ONCE(conn->rmb_desc->used, 0);
 		}
 	}
 }