diff mbox series

net/rds: Improper memory ordering semantic in release_in_xmit()

Message ID Zehp16cKYeGWknJs@libra05 (mailing list archive)
State RFC
Headers show
Series net/rds: Improper memory ordering semantic in release_in_xmit() | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 944 this patch: 944
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 956 this patch: 956
netdev/verify_signedoff fail author Signed-off-by missing
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 960 this patch: 960
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-03-06--21-00 (tests: 892)

Commit Message

Yewon Choi March 6, 2024, 1:04 p.m. UTC
Hello,

It seems to be that clear_bit() in release_in_xmit() doesn't have
release semantic while it works as a bit lock in rds_send_xmit().
Since acquire/release_in_xmit() are used in rds_send_xmit() for the 
serialization between callers of rds_send_xmit(), they should imply 
acquire/release semantics like other locks.

Although smp_mb__after_atomic() is placed after clear_bit(), it cannot
prevent that instructions before clear_bit() (in critical section) are
reordered after clear_bit().
As a result, mutual exclusion may not be guaranteed in specific
HW architectures like Arm.

We tested that this locking implementation doesn't guarantee the atomicity of
critical section in Arm server. Testing was done with Arm Neoverse N1 cores,
and the testing code was generated by litmus testing tool (klitmus7). 

Initial condition:

l = x = y = r0 = r1 = 0

Thread 0:

if (test_and_set_bit(0, l) == 0) {
    WRITE_ONCE(*x, 1);
    WRITE_ONCE(*y, 1);
    clear_bit(0, l);
    smp_mb__after_atomic();
}

Thread 1:

if (test_and_set_bit(0, l) == 0) {
    r0 = READ_ONCE(*x);
    r1 = READ_ONCE(*y);
    clear_bit(0, l);
    smp_mb__after_atomic();
}

If the implementation is correct, the value of r0 and r1 should show
all-or-nothing behavior (both 0 or 1). However, below test result shows 
that atomicity violation is very rare, but exists:

Histogram (4 states)
9673811 :>1:r0=0; 1:r1=0;
5647    :>1:r0=1; 1:r1=0; // Violate atomicity
9605    :>1:r0=0; 1:r1=1; // Violate atomicity
6310937 :>1:r0=1; 1:r1=1;

So, we suggest introducing release semantic using clear_bit_unlock()
instead of clear_bit():


Could you check this please? If needed, we will send a patch.

Best Regards,
Yewon Choi

Comments

Allison Henderson March 7, 2024, 8:13 p.m. UTC | #1
On Wed, 2024-03-06 at 22:04 +0900, Yewon Choi wrote:
> Hello,
> 
> It seems to be that clear_bit() in release_in_xmit() doesn't have
> release semantic while it works as a bit lock in rds_send_xmit().
> Since acquire/release_in_xmit() are used in rds_send_xmit() for the 
> serialization between callers of rds_send_xmit(), they should imply 
> acquire/release semantics like other locks.
> 
> Although smp_mb__after_atomic() is placed after clear_bit(), it
> cannot
> prevent that instructions before clear_bit() (in critical section)
> are
> reordered after clear_bit().
> As a result, mutual exclusion may not be guaranteed in specific
> HW architectures like Arm.
> 
> We tested that this locking implementation doesn't guarantee the
> atomicity of
> critical section in Arm server. Testing was done with Arm Neoverse N1
> cores,
> and the testing code was generated by litmus testing tool (klitmus7).
> 
> Initial condition:
> 
> l = x = y = r0 = r1 = 0
> 
> Thread 0:
> 
> if (test_and_set_bit(0, l) == 0) {
>     WRITE_ONCE(*x, 1);
>     WRITE_ONCE(*y, 1);
>     clear_bit(0, l);
>     smp_mb__after_atomic();
> }
> 
> Thread 1:
> 
> if (test_and_set_bit(0, l) == 0) {
>     r0 = READ_ONCE(*x);
>     r1 = READ_ONCE(*y);
>     clear_bit(0, l);
>     smp_mb__after_atomic();
> }
> 
> If the implementation is correct, the value of r0 and r1 should show
> all-or-nothing behavior (both 0 or 1). However, below test result
> shows 
> that atomicity violation is very rare, but exists:
> 
> Histogram (4 states)
> 9673811 :>1:r0=0; 1:r1=0;
> 5647    :>1:r0=1; 1:r1=0; // Violate atomicity
> 9605    :>1:r0=0; 1:r1=1; // Violate atomicity
> 6310937 :>1:r0=1; 1:r1=1;
> 
> So, we suggest introducing release semantic using clear_bit_unlock()
> instead of clear_bit():
> 
> diff --git a/net/rds/send.c b/net/rds/send.c
> index 5e57a1581dc6..65b1bb06ca71 100644
> --- a/net/rds/send.c
> +++ b/net/rds/send.c
> @@ -108,7 +108,7 @@ static int acquire_in_xmit(struct rds_conn_path
> *cp)
>  
>  static void release_in_xmit(struct rds_conn_path *cp)
>  {
> -       clear_bit(RDS_IN_XMIT, &cp->cp_flags);
> +       clear_bit_unlock(RDS_IN_XMIT, &cp->cp_flags);
>         smp_mb__after_atomic();
>         /*
>          * We don't use wait_on_bit()/wake_up_bit() because our
> waking is in a
> 
> Could you check this please? If needed, we will send a patch.

Hi Yewon,

Thank you for finding this.  I had a look at the code you had
mentioned, and while I don't see any use cases of release_in_xmit()
that might result in an out of order read, I do think that the proposed
change is a good clean up.  If you choose to submit a patch, please
remove the proceeding "smp_mb__after_atomic" line as well, as it would
no longer be needed.  Also, please update acquire_in_xmit() to use the
corresponding test_and_set_bit_lock() call.  Thank you!

Allison


> 
> Best Regards,
> Yewon Choi
Yewon Choi March 14, 2024, 11:39 a.m. UTC | #2
On Thu, Mar 07, 2024 at 08:13:50PM +0000, Allison Henderson wrote:
> On Wed, 2024-03-06 at 22:04 +0900, Yewon Choi wrote:
> > Hello,
> > 
> > It seems to be that clear_bit() in release_in_xmit() doesn't have
> > release semantic while it works as a bit lock in rds_send_xmit().
> > Since acquire/release_in_xmit() are used in rds_send_xmit() for the 
> > serialization between callers of rds_send_xmit(), they should imply 
> > acquire/release semantics like other locks.
> > 
> > Although smp_mb__after_atomic() is placed after clear_bit(), it
> > cannot
> > prevent that instructions before clear_bit() (in critical section)
> > are
> > reordered after clear_bit().
> > As a result, mutual exclusion may not be guaranteed in specific
> > HW architectures like Arm.
> > 
> > We tested that this locking implementation doesn't guarantee the
> > atomicity of
> > critical section in Arm server. Testing was done with Arm Neoverse N1
> > cores,
> > and the testing code was generated by litmus testing tool (klitmus7).
> > 
> > Initial condition:
> > 
> > l = x = y = r0 = r1 = 0
> > 
> > Thread 0:
> > 
> > if (test_and_set_bit(0, l) == 0) {
> >     WRITE_ONCE(*x, 1);
> >     WRITE_ONCE(*y, 1);
> >     clear_bit(0, l);
> >     smp_mb__after_atomic();
> > }
> > 
> > Thread 1:
> > 
> > if (test_and_set_bit(0, l) == 0) {
> >     r0 = READ_ONCE(*x);
> >     r1 = READ_ONCE(*y);
> >     clear_bit(0, l);
> >     smp_mb__after_atomic();
> > }
> > 
> > If the implementation is correct, the value of r0 and r1 should show
> > all-or-nothing behavior (both 0 or 1). However, below test result
> > shows 
> > that atomicity violation is very rare, but exists:
> > 
> > Histogram (4 states)
> > 9673811 :>1:r0=0; 1:r1=0;
> > 5647    :>1:r0=1; 1:r1=0; // Violate atomicity
> > 9605    :>1:r0=0; 1:r1=1; // Violate atomicity
> > 6310937 :>1:r0=1; 1:r1=1;
> > 
> > So, we suggest introducing release semantic using clear_bit_unlock()
> > instead of clear_bit():
> > 
> > diff --git a/net/rds/send.c b/net/rds/send.c
> > index 5e57a1581dc6..65b1bb06ca71 100644
> > --- a/net/rds/send.c
> > +++ b/net/rds/send.c
> > @@ -108,7 +108,7 @@ static int acquire_in_xmit(struct rds_conn_path
> > *cp)
> >  
> >  static void release_in_xmit(struct rds_conn_path *cp)
> >  {
> > -       clear_bit(RDS_IN_XMIT, &cp->cp_flags);
> > +       clear_bit_unlock(RDS_IN_XMIT, &cp->cp_flags);
> >         smp_mb__after_atomic();
> >         /*
> >          * We don't use wait_on_bit()/wake_up_bit() because our
> > waking is in a
> > 
> > Could you check this please? If needed, we will send a patch.
> 
> Hi Yewon,
> 
> Thank you for finding this.  I had a look at the code you had
> mentioned, and while I don't see any use cases of release_in_xmit()
> that might result in an out of order read, I do think that the proposed
> change is a good clean up.  If you choose to submit a patch, please
> remove the proceeding "smp_mb__after_atomic" line as well, as it would
> no longer be needed.  Also, please update acquire_in_xmit() to use the
> corresponding test_and_set_bit_lock() call.  Thank you!
>

Thank you for examining this and giving suggestions!
I sent a patch with changes including your suggestions. If it has
problems, I will correct them as soon as possible.

Sincerely,
Yewon Choi

> Allison
> 
> 
> > 
> > Best Regards,
> > Yewon Choi
>
diff mbox series

Patch

diff --git a/net/rds/send.c b/net/rds/send.c
index 5e57a1581dc6..65b1bb06ca71 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -108,7 +108,7 @@  static int acquire_in_xmit(struct rds_conn_path *cp)
 
 static void release_in_xmit(struct rds_conn_path *cp)
 {
-	clear_bit(RDS_IN_XMIT, &cp->cp_flags);
+	clear_bit_unlock(RDS_IN_XMIT, &cp->cp_flags);
 	smp_mb__after_atomic();
 	/*
 	 * We don't use wait_on_bit()/wake_up_bit() because our waking is in a