diff mbox series

RDMA/siw: Reuse value read using READ_ONCE instead of re-reading it

Message ID tencent_32C3AEB0599DF0A0010A862439636CDA2707@qq.com (mailing list archive)
State Changes Requested
Headers show
Series RDMA/siw: Reuse value read using READ_ONCE instead of re-reading it | expand

Commit Message

linke li March 9, 2024, 12:27 p.m. UTC
In siw_orqe_start_rx, the orqe's flag in the if condition is read using
READ_ONCE, checked, and then re-read, voiding all guarantees of the
checks. Reuse the value that was read by READ_ONCE to ensure the
consistency of the flags throughout the function.

Signed-off-by: linke li <lilinke99@qq.com>
---
 drivers/infiniband/sw/siw/siw_qp_rx.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Zhu Yanjun March 10, 2024, 4:53 a.m. UTC | #1
在 2024/3/9 13:27, linke li 写道:
> In siw_orqe_start_rx, the orqe's flag in the if condition is read using
> READ_ONCE, checked, and then re-read, voiding all guarantees of the
> checks. Reuse the value that was read by READ_ONCE to ensure the
> consistency of the flags throughout the function.
> 
> Signed-off-by: linke li <lilinke99@qq.com>
> ---
>   drivers/infiniband/sw/siw/siw_qp_rx.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/siw/siw_qp_rx.c b/drivers/infiniband/sw/siw/siw_qp_rx.c
> index ed4fc39718b4..f5f69de56882 100644
> --- a/drivers/infiniband/sw/siw/siw_qp_rx.c
> +++ b/drivers/infiniband/sw/siw/siw_qp_rx.c
> @@ -740,6 +740,7 @@ static int siw_orqe_start_rx(struct siw_qp *qp)
>   {
>   	struct siw_sqe *orqe;
>   	struct siw_wqe *wqe = NULL;
> +	u16 orqe_flags;
>   
>   	if (unlikely(!qp->attrs.orq_size))
>   		return -EPROTO;
> @@ -748,7 +749,8 @@ static int siw_orqe_start_rx(struct siw_qp *qp)
>   	smp_mb();
>   
>   	orqe = orq_get_current(qp);
> -	if (READ_ONCE(orqe->flags) & SIW_WQE_VALID) {

In this if test, READ_ONCE is needed to read orqe->flags. But in this 
commit, this READ_ONCE is moved to other places.

In a complicated environment, for example, this function is called many 
times at the same time and orqe->flags is changed at the same time, I am 
not sure if this will introduce risks or not.

if you need to ensure the consistency of the flags throughout the 
function, not sure if the following is better or not.

if (((orqe_flags=READ_ONCE(orqe->flags))) & SIW_WQE_VALID) {

Thanks,
Zhu Yanjun

> +	orqe_flags = READ_ONCE(orqe->flags);
> +	if (orqe_flags & SIW_WQE_VALID) {
>   		/* RRESP is a TAGGED RDMAP operation */
>   		wqe = rx_wqe(&qp->rx_tagged);
>   		wqe->sqe.id = orqe->id;
> @@ -756,7 +758,7 @@ static int siw_orqe_start_rx(struct siw_qp *qp)
>   		wqe->sqe.sge[0].laddr = orqe->sge[0].laddr;
>   		wqe->sqe.sge[0].lkey = orqe->sge[0].lkey;
>   		wqe->sqe.sge[0].length = orqe->sge[0].length;
> -		wqe->sqe.flags = orqe->flags;
> +		wqe->sqe.flags = orqe_flags;
>   		wqe->sqe.num_sge = 1;
>   		wqe->bytes = orqe->sge[0].length;
>   		wqe->processed = 0;
Leon Romanovsky March 10, 2024, 11:33 a.m. UTC | #2
On Sat, Mar 09, 2024 at 08:27:16PM +0800, linke li wrote:
> In siw_orqe_start_rx, the orqe's flag in the if condition is read using
> READ_ONCE, checked, and then re-read, voiding all guarantees of the
> checks. Reuse the value that was read by READ_ONCE to ensure the
> consistency of the flags throughout the function.

Please read include/asm-generic/rwonce.h comments when READ_ONCE() is used.
There is no value in caching the output of READ_ONCE().

Thanks
linke li March 10, 2024, 12:15 p.m. UTC | #3
I want to emphasize that if the value of orqe->flags has changed by the
time of the second read, the value read will not satisfy the if condition,
causing inconsistency. Given that there is already a READ_ONCE.
linke li March 10, 2024, 12:36 p.m. UTC | #4
> In a complicated environment, for example, this function is called many 
> times at the same time and orqe->flags is changed at the same time, I am 
> not sure if this will introduce risks or not.

I think one function of READ_ONCE is to read a valid value while the value
may change concurrently. And there is a smp() above the READ_ONCE, which
means that the READ_ONCE is well ordered. I think it is kind of safe here.

> if you need to ensure the consistency of the flags throughout the function, not sure if the following is better or not.

> if (((orqe_flags=READ_ONCE(orqe->flags))) & SIW_WQE_VALID) {

This patch looks like exactly do the same things. The only difference I
think is the code style.

Thanks,
Linke
Greg Sword March 10, 2024, 5 p.m. UTC | #5
On Sun, Mar 10, 2024 at 8:36 PM linke li <lilinke99@qq.com> wrote:
>
> > In a complicated environment, for example, this function is called many
> > times at the same time and orqe->flags is changed at the same time, I am
> > not sure if this will introduce risks or not.
>
> I think one function of READ_ONCE is to read a valid value while the value
> may change concurrently. And there is a smp() above the READ_ONCE, which
> means that the READ_ONCE is well ordered. I think it is kind of safe here.

This is not a smp problem. Compared with the original source, your
commit introduces a time slot.

>
> > if you need to ensure the consistency of the flags throughout the function, not sure if the following is better or not.
>
> > if (((orqe_flags=READ_ONCE(orqe->flags))) & SIW_WQE_VALID) {
>
> This patch looks like exactly do the same things. The only difference I
> think is the code style.

No.

>
> Thanks,
> Linke
>
>
Greg Sword March 10, 2024, 5:02 p.m. UTC | #6
On Sun, Mar 10, 2024 at 7:33 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Sat, Mar 09, 2024 at 08:27:16PM +0800, linke li wrote:
> > In siw_orqe_start_rx, the orqe's flag in the if condition is read using
> > READ_ONCE, checked, and then re-read, voiding all guarantees of the
> > checks. Reuse the value that was read by READ_ONCE to ensure the
> > consistency of the flags throughout the function.
>
> Please read include/asm-generic/rwonce.h comments when READ_ONCE() is used.
> There is no value in caching the output of READ_ONCE().

Agree. Read the link
https://www.kernel.org/doc/Documentation/memory-barriers.txt, too

>
> Thanks
>
Leon Romanovsky March 10, 2024, 7:19 p.m. UTC | #7
On Sun, Mar 10, 2024 at 08:15:25PM +0800, linke li wrote:
> I want to emphasize that if the value of orqe->flags has changed by the
> time of the second read, the value read will not satisfy the if condition,
> causing inconsistency. Given that there is already a READ_ONCE.

If value can change between subsequent reads, then you need to use locks
to make sure that it doesn't happen. Using READ_ONCE() doesn't solve the
concurrency issue, but makes sure that compiler doesn't reorder reads
and writes.

Thanks
linke li March 11, 2024, 2:34 a.m. UTC | #8
> If value can change between subsequent reads, then you need to use locks
> to make sure that it doesn't happen. Using READ_ONCE() doesn't solve the
> concurrency issue, but makes sure that compiler doesn't reorder reads
> and writes.

This code do not need to prevent other thread from writing on the flags.

This topic got quite a bit of discussion [1], quote from it:

    (READ_ONCE and WRITE_ONCE)
    That's often useful - lots of code doesn't really care if you get the
    old or the new value, but the code *does* care that it gets *one*
    value, and not some random mix of "I tested one value for validity,
    then it got reloaded due to register pressure, and I actually used
    another value".

    And not some "I read one value, and it was a mix of two other values".
 
From the original code, the first read seems to do the same things. So
READ_ONCE is probably ok here. 

I just want to make sure the flags stored to wqe->sqe.flags is consistent
with the read used in the if condition.

[1]https://lore.kernel.org/lkml/CAHk-=wgG6Dmt1JTXDbrbXh_6s2yLjL=9pHo7uv0==LHFD+aBtg@mail.gmail.com/
linke li March 11, 2024, 2:57 a.m. UTC | #9
> This is not a smp problem. Compared with the original source, your
> commit introduces a time slot.

I don't know what do you mean by a time slot. In the binary level, they
have the same code.
Zhu Yanjun March 11, 2024, 5:11 a.m. UTC | #10
在 2024/3/11 3:34, linke li 写道:
>> If value can change between subsequent reads, then you need to use locks
>> to make sure that it doesn't happen. Using READ_ONCE() doesn't solve the
>> concurrency issue, but makes sure that compiler doesn't reorder reads
>> and writes.
> 
> This code do not need to prevent other thread from writing on the flags.
> 
> This topic got quite a bit of discussion [1], quote from it:
> 
>      (READ_ONCE and WRITE_ONCE)
>      That's often useful - lots of code doesn't really care if you get the
>      old or the new value, but the code *does* care that it gets *one*
>      value, and not some random mix of "I tested one value for validity,
>      then it got reloaded due to register pressure, and I actually used
>      another value".
> 
>      And not some "I read one value, and it was a mix of two other values".
>   
>  From the original code, the first read seems to do the same things. So
> READ_ONCE is probably ok here.
> 
> I just want to make sure the flags stored to wqe->sqe.flags is consistent
> with the read used in the if condition.

Sure. Follow Leon's advice, to make this ("wqe->sqe.flags is consistent 
with the read used in the if condition") happen, you need a lock to 
ensure it. The lock can be spin lock or mutex lock depens on its 
sleeping or not.

 From the original source code, wqe->sqe.flags should be a volatile 
variable. It should be read from the original source, not from cache.

Zhu Yanjun

> 
> [1]https://lore.kernel.org/lkml/CAHk-=wgG6Dmt1JTXDbrbXh_6s2yLjL=9pHo7uv0==LHFD+aBtg@mail.gmail.com/
>
Zhu Yanjun March 11, 2024, 8:17 a.m. UTC | #11
In the original source code, READ_ONCE(xxx) is in if test. In your 
commit, you move READ_ONCE out of this if test.

So the time slot exists between fetching and using. In the original 
source code, it does not exist. And the fetching and using are not 
protected by locks. As is suggested by Leon.

This will introduce risks.

The binary is based on optimization level and architectures. It is very 
complicated.

Zhu Yanjun

On 11.03.24 03:57, linke li wrote:
>> This is not a smp problem. Compared with the original source, your
>> commit introduces a time slot.
> I don't know what do you mean by a time slot. In the binary level, they
> have the same code.
>
Bernard Metzler March 11, 2024, 2:14 p.m. UTC | #12
> -----Original Message-----
> From: linke li <lilinke99@qq.com>
> Sent: Saturday, March 9, 2024 1:27 PM
> Cc: lilinke99@qq.com; Bernard Metzler <BMT@zurich.ibm.com>; Jason Gunthorpe
> <jgg@ziepe.ca>; Leon Romanovsky <leon@kernel.org>; linux-
> rdma@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [EXTERNAL] [PATCH] RDMA/siw: Reuse value read using READ_ONCE
> instead of re-reading it
> 
> In siw_orqe_start_rx, the orqe's flag in the if condition is read using
> READ_ONCE, checked, and then re-read, voiding all guarantees of the
> checks. Reuse the value that was read by READ_ONCE to ensure the
> consistency of the flags throughout the function.
> 
> Signed-off-by: linke li <lilinke99@qq.com>
> ---
>  drivers/infiniband/sw/siw/siw_qp_rx.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/siw/siw_qp_rx.c
> b/drivers/infiniband/sw/siw/siw_qp_rx.c
> index ed4fc39718b4..f5f69de56882 100644
> --- a/drivers/infiniband/sw/siw/siw_qp_rx.c
> +++ b/drivers/infiniband/sw/siw/siw_qp_rx.c
> @@ -740,6 +740,7 @@ static int siw_orqe_start_rx(struct siw_qp *qp)
>  {
>  	struct siw_sqe *orqe;
>  	struct siw_wqe *wqe = NULL;
> +	u16 orqe_flags;
> 
>  	if (unlikely(!qp->attrs.orq_size))
>  		return -EPROTO;
> @@ -748,7 +749,8 @@ static int siw_orqe_start_rx(struct siw_qp *qp)
>  	smp_mb();
> 
>  	orqe = orq_get_current(qp);
> -	if (READ_ONCE(orqe->flags) & SIW_WQE_VALID) {
> +	orqe_flags = READ_ONCE(orqe->flags);
> +	if (orqe_flags & SIW_WQE_VALID) {
>  		/* RRESP is a TAGGED RDMAP operation */
>  		wqe = rx_wqe(&qp->rx_tagged);
>  		wqe->sqe.id = orqe->id;
> @@ -756,7 +758,7 @@ static int siw_orqe_start_rx(struct siw_qp *qp)
>  		wqe->sqe.sge[0].laddr = orqe->sge[0].laddr;
>  		wqe->sqe.sge[0].lkey = orqe->sge[0].lkey;
>  		wqe->sqe.sge[0].length = orqe->sge[0].length;
> -		wqe->sqe.flags = orqe->flags;
> +		wqe->sqe.flags = orqe_flags;
>  		wqe->sqe.num_sge = 1;
>  		wqe->bytes = orqe->sge[0].length;
>  		wqe->processed = 0;
> --
> 2.39.3 (Apple Git-146)
> 
> 

The outbound read queue (orq) is a ring buffer with only one
consumer (this code) and one producer (READ.request sending
code). There is no parallel reader and a single writer.

The producer (sender of the READ.request) sets the orq entry
valid and does this only once after completely writing
the entry. It does it under qp->orq_lock.

Only if we find the orq entry valid, its content gets copied
at the beginning of a new READ.response (this code).

The orq entry remains valid to stop the producer from re-using
it until the complete READ.response has been received (may be
multiple fragments). The flag gets cleared under qp->orq_lock
after the complete READ.response has been received, or the
response was invalid.


There is no possibility a valid orq entry gets invalidated
after it has been found valid, so it is safe to copy all its
members. 

Thanks,
Bernard.
linke li March 12, 2024, 1:30 a.m. UTC | #13
Thank you for your reasonal reply. That makes sense. But you may still
consider to make it better, like this patch, to read the flag only one 
time. It will avoid some potential risks. However, it depends on 
maintainer's choice.

Linke
Thanks
Leon Romanovsky March 12, 2024, 7:57 a.m. UTC | #14
On Tue, Mar 12, 2024 at 09:30:53AM +0800, linke li wrote:
> Thank you for your reasonal reply. That makes sense. But you may still
> consider to make it better, like this patch, to read the flag only one 
> time. It will avoid some potential risks. However, it depends on 
> maintainer's choice.

Maintainer doesn't see any potential risks and value is read only once anyway.

Thanks

> 
> Linke
> Thanks
> 
>
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/siw/siw_qp_rx.c b/drivers/infiniband/sw/siw/siw_qp_rx.c
index ed4fc39718b4..f5f69de56882 100644
--- a/drivers/infiniband/sw/siw/siw_qp_rx.c
+++ b/drivers/infiniband/sw/siw/siw_qp_rx.c
@@ -740,6 +740,7 @@  static int siw_orqe_start_rx(struct siw_qp *qp)
 {
 	struct siw_sqe *orqe;
 	struct siw_wqe *wqe = NULL;
+	u16 orqe_flags;
 
 	if (unlikely(!qp->attrs.orq_size))
 		return -EPROTO;
@@ -748,7 +749,8 @@  static int siw_orqe_start_rx(struct siw_qp *qp)
 	smp_mb();
 
 	orqe = orq_get_current(qp);
-	if (READ_ONCE(orqe->flags) & SIW_WQE_VALID) {
+	orqe_flags = READ_ONCE(orqe->flags);
+	if (orqe_flags & SIW_WQE_VALID) {
 		/* RRESP is a TAGGED RDMAP operation */
 		wqe = rx_wqe(&qp->rx_tagged);
 		wqe->sqe.id = orqe->id;
@@ -756,7 +758,7 @@  static int siw_orqe_start_rx(struct siw_qp *qp)
 		wqe->sqe.sge[0].laddr = orqe->sge[0].laddr;
 		wqe->sqe.sge[0].lkey = orqe->sge[0].lkey;
 		wqe->sqe.sge[0].length = orqe->sge[0].length;
-		wqe->sqe.flags = orqe->flags;
+		wqe->sqe.flags = orqe_flags;
 		wqe->sqe.num_sge = 1;
 		wqe->bytes = orqe->sge[0].length;
 		wqe->processed = 0;