[v2] docs/RCU: Bring smp_wmb() back

Message ID	20230617145346.29009-1-mmpgouride@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <rcu-owner@vger.kernel.org> From: Alan Huang <mmpgouride@gmail.com> To: paulmck@kernel.org Cc: joel@joelfernandes.org, corbet@lwn.net, rcu@vger.kernel.org, linux-doc@vger.kernel.org, Alan Huang <mmpgouride@gmail.com> Subject: [PATCH v2] docs/RCU: Bring smp_wmb() back Date: Sat, 17 Jun 2023 14:53:46 +0000 Message-Id: <20230617145346.29009-1-mmpgouride@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[v2] docs/RCU: Bring smp_wmb() back \| expand [v2] docs/RCU: Bring smp_wmb() back

Alan Huang June 17, 2023, 2:53 p.m. UTC

There are two memory ordering required in the insertion algorithm,
we need to make sure obj->key is updated before obj->obj_node.next
and obj->refcnt, atomic_set_release is not enough to provide the
required memory barrier.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
---
 Documentation/RCU/rculist_nulls.rst | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Matthew Wilcox June 17, 2023, 5:23 p.m. UTC | #1

On Sat, Jun 17, 2023 at 02:53:46PM +0000, Alan Huang wrote:
> There are two memory ordering required in the insertion algorithm,
> we need to make sure obj->key is updated before obj->obj_node.next
> and obj->refcnt, atomic_set_release is not enough to provide the
> required memory barrier.
> 
> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
> ---
>  Documentation/RCU/rculist_nulls.rst | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst
> index e06ed40bb6..77244adbdf 100644
> --- a/Documentation/RCU/rculist_nulls.rst
> +++ b/Documentation/RCU/rculist_nulls.rst
> @@ -98,7 +98,7 @@ Quoting Corey Minyard::
>  ----------------------
>  
>  We need to make sure a reader cannot read the new 'obj->obj_node.next' value
> -and previous value of 'obj->key'. Otherwise, an item could be deleted
> +and previous value of 'obj->key' at the same time. Otherwise, an item could be deleted

"at the same time" doesn't make a lot of sense to me.  CPUs don't do
anything "at the same time".  I think the way this is worded now is
fine; I tried coming up with a few variants of this, but none are as
clear and succinct as what is there now.

>  from a chain, and inserted into another chain. If new chain was empty
>  before the move, 'next' pointer is NULL, and lockless reader can not
>  detect the fact that it missed following items in original chain.
> @@ -112,7 +112,12 @@ detect the fact that it missed following items in original chain.
>    obj = kmem_cache_alloc(...);
>    lock_chain(); // typically a spin_lock()
>    obj->key = key;
> -  atomic_set_release(&obj->refcnt, 1); // key before refcnt
> +  /*
> +  * we need to make sure obj->key is updated before obj->obj_node.next
> +  * and obj->refcnt
> +  */
> +  smp_wmb();
> +  atomic_set(&obj->refcnt, 1);

Perhaps this could be a little clearer for those of us who aren't as
deep into the memory model as others ... are you saying that the
atomic_set_release() would only order references to obj->refcount and
would not order accesses to obj->key?  Because reading this:

     The use of ACQUIRE and RELEASE operations generally precludes the need
     for other sorts of memory barrier.  In addition, a RELEASE+ACQUIRE pair is
     -not- guaranteed to act as a full memory barrier.  However, after an
     ACQUIRE on a given variable, all memory accesses preceding any prior
     RELEASE on that same variable are guaranteed to be visible.  In other
     words, within a given variable's critical section, all accesses of all
     previous critical sections for that variable are guaranteed to have
     completed.

makes me think that this example is fine; anybody doing a load-acquire
on obj->refcount will see the update to obj->key that happened before
the store-release to obj->refcount.

I am not an expert and would welcome the opportunity to learn more here.

Alan Huang June 17, 2023, 5:55 p.m. UTC | #2

> 2023年6月18日 01:23，Matthew Wilcox <willy@infradead.org> 写道：
> 
> On Sat, Jun 17, 2023 at 02:53:46PM +0000, Alan Huang wrote:
>> There are two memory ordering required in the insertion algorithm,
>> we need to make sure obj->key is updated before obj->obj_node.next
>> and obj->refcnt, atomic_set_release is not enough to provide the
>> required memory barrier.
>> 
>> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
>> ---
>> Documentation/RCU/rculist_nulls.rst | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>> 
>> diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst
>> index e06ed40bb6..77244adbdf 100644
>> --- a/Documentation/RCU/rculist_nulls.rst
>> +++ b/Documentation/RCU/rculist_nulls.rst
>> @@ -98,7 +98,7 @@ Quoting Corey Minyard::
>> ----------------------
>> 
>> We need to make sure a reader cannot read the new 'obj->obj_node.next' value
>> -and previous value of 'obj->key'. Otherwise, an item could be deleted
>> +and previous value of 'obj->key' at the same time. Otherwise, an item could be deleted
> 
> "at the same time" doesn't make a lot of sense to me.  CPUs don't do
> anything "at the same time".  I think the way this is worded now is
> fine; I tried coming up with a few variants of this, but none are as
> clear and succinct as what is there now.
> 
>> from a chain, and inserted into another chain. If new chain was empty
>> before the move, 'next' pointer is NULL, and lockless reader can not
>> detect the fact that it missed following items in original chain.
>> @@ -112,7 +112,12 @@ detect the fact that it missed following items in original chain.
>>   obj = kmem_cache_alloc(...);
>>   lock_chain(); // typically a spin_lock()
>>   obj->key = key;
>> -  atomic_set_release(&obj->refcnt, 1); // key before refcnt
>> +  /*
>> +  * we need to make sure obj->key is updated before obj->obj_node.next
>> +  * and obj->refcnt
>> +  */
>> +  smp_wmb();
>> +  atomic_set(&obj->refcnt, 1);
> 
> Perhaps this could be a little clearer for those of us who aren't as
> deep into the memory model as others ... are you saying that the
> atomic_set_release() would only order references to obj->refcount and
> would not order accesses to obj->key?  Because reading this:
> 
>     The use of ACQUIRE and RELEASE operations generally precludes the need
>     for other sorts of memory barrier.  In addition, a RELEASE+ACQUIRE pair is
>     -not- guaranteed to act as a full memory barrier.  However, after an
>     ACQUIRE on a given variable, all memory accesses preceding any prior
>     RELEASE on that same variable are guaranteed to be visible.  In other
>     words, within a given variable's critical section, all accesses of all
>     previous critical sections for that variable are guaranteed to have
>     completed.
> 
> makes me think that this example is fine; anybody doing a load-acquire
> on obj->refcount will see the update to obj->key that happened before
> the store-release to obj->refcount.

Two memory ordering required here, atomic_set_release only provides one of them (the one you mentioned)

The objects are allocated with SLAB_TYPESAFE_BY_RCU, and there is:
	
		n->next = first;

in hlist_add_head_rcu, which modifies obj->obj_node.next.

So, we must make sure obj->key is updated before obj->obj_node.next, without smp_wmb(), we can read
the new 'obj->obj_node.next’ value and previous value of 'obj->key’ at the same time, and in this case, we
can not detect the movement of the object.

The following link might be helpful:

		https://patchwork.ozlabs.org/project/netdev/patch/491C282A.5050802@cosmosbay.com/


> 
> I am not an expert and would welcome the opportunity to learn more here.

Matthew Wilcox June 17, 2023, 6:07 p.m. UTC | #3

On Sun, Jun 18, 2023 at 01:55:10AM +0800, Alan Huang wrote:
> 
> > 2023年6月18日 01:23，Matthew Wilcox <willy@infradead.org> 写道：
> > 
> > On Sat, Jun 17, 2023 at 02:53:46PM +0000, Alan Huang wrote:
> >> There are two memory ordering required in the insertion algorithm,
> >> we need to make sure obj->key is updated before obj->obj_node.next
> >> and obj->refcnt, atomic_set_release is not enough to provide the
> >> required memory barrier.
> >> 
> >> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
> >> ---
> >> Documentation/RCU/rculist_nulls.rst | 9 +++++++--
> >> 1 file changed, 7 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst
> >> index e06ed40bb6..77244adbdf 100644
> >> --- a/Documentation/RCU/rculist_nulls.rst
> >> +++ b/Documentation/RCU/rculist_nulls.rst
> >> @@ -98,7 +98,7 @@ Quoting Corey Minyard::
> >> ----------------------
> >> 
> >> We need to make sure a reader cannot read the new 'obj->obj_node.next' value
> >> -and previous value of 'obj->key'. Otherwise, an item could be deleted
> >> +and previous value of 'obj->key' at the same time. Otherwise, an item could be deleted
> > 
> > "at the same time" doesn't make a lot of sense to me.  CPUs don't do
> > anything "at the same time".  I think the way this is worded now is
> > fine; I tried coming up with a few variants of this, but none are as
> > clear and succinct as what is there now.
> > 
> >> from a chain, and inserted into another chain. If new chain was empty
> >> before the move, 'next' pointer is NULL, and lockless reader can not
> >> detect the fact that it missed following items in original chain.
> >> @@ -112,7 +112,12 @@ detect the fact that it missed following items in original chain.
> >>   obj = kmem_cache_alloc(...);
> >>   lock_chain(); // typically a spin_lock()
> >>   obj->key = key;
> >> -  atomic_set_release(&obj->refcnt, 1); // key before refcnt
> >> +  /*
> >> +  * we need to make sure obj->key is updated before obj->obj_node.next
> >> +  * and obj->refcnt
> >> +  */
> >> +  smp_wmb();
> >> +  atomic_set(&obj->refcnt, 1);
> > 
> > Perhaps this could be a little clearer for those of us who aren't as
> > deep into the memory model as others ... are you saying that the
> > atomic_set_release() would only order references to obj->refcount and
> > would not order accesses to obj->key?  Because reading this:
> > 
> >     The use of ACQUIRE and RELEASE operations generally precludes the need
> >     for other sorts of memory barrier.  In addition, a RELEASE+ACQUIRE pair is
> >     -not- guaranteed to act as a full memory barrier.  However, after an
> >     ACQUIRE on a given variable, all memory accesses preceding any prior
> >     RELEASE on that same variable are guaranteed to be visible.  In other
> >     words, within a given variable's critical section, all accesses of all
> >     previous critical sections for that variable are guaranteed to have
> >     completed.
> > 
> > makes me think that this example is fine; anybody doing a load-acquire
> > on obj->refcount will see the update to obj->key that happened before
> > the store-release to obj->refcount.
> 
> Two memory ordering required here, atomic_set_release only provides one of them (the one you mentioned)
> 
> The objects are allocated with SLAB_TYPESAFE_BY_RCU, and there is:
> 	
> 		n->next = first;
> 
> in hlist_add_head_rcu, which modifies obj->obj_node.next.

But isn't that the bug?  ie this assignment should be WRITE_ONCE() or
smp_store_release() or rcu_assign_pointer() or whichever of these
similar functions is actually appropriate?

> So, we must make sure obj->key is updated before obj->obj_node.next, without smp_wmb(), we can read
> the new 'obj->obj_node.next’ value and previous value of 'obj->key’ at the same time, and in this case, we
> can not detect the movement of the object.
> 
> The following link might be helpful:
> 
> 		https://patchwork.ozlabs.org/project/netdev/patch/491C282A.5050802@cosmosbay.com/
> 
> 
> > 
> > I am not an expert and would welcome the opportunity to learn more here.
> 
>

Alan Huang June 17, 2023, 6:21 p.m. UTC | #4

> 2023年6月18日 02:07，Matthew Wilcox <willy@infradead.org> 写道：
> 
> On Sun, Jun 18, 2023 at 01:55:10AM +0800, Alan Huang wrote:
>> 
>>> 2023年6月18日 01:23，Matthew Wilcox <willy@infradead.org> 写道：
>>> 
>>> On Sat, Jun 17, 2023 at 02:53:46PM +0000, Alan Huang wrote:
>>>> There are two memory ordering required in the insertion algorithm,
>>>> we need to make sure obj->key is updated before obj->obj_node.next
>>>> and obj->refcnt, atomic_set_release is not enough to provide the
>>>> required memory barrier.
>>>> 
>>>> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
>>>> ---
>>>> Documentation/RCU/rculist_nulls.rst | 9 +++++++--
>>>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>>> 
>>>> diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst
>>>> index e06ed40bb6..77244adbdf 100644
>>>> --- a/Documentation/RCU/rculist_nulls.rst
>>>> +++ b/Documentation/RCU/rculist_nulls.rst
>>>> @@ -98,7 +98,7 @@ Quoting Corey Minyard::
>>>> ----------------------
>>>> 
>>>> We need to make sure a reader cannot read the new 'obj->obj_node.next' value
>>>> -and previous value of 'obj->key'. Otherwise, an item could be deleted
>>>> +and previous value of 'obj->key' at the same time. Otherwise, an item could be deleted
>>> 
>>> "at the same time" doesn't make a lot of sense to me.  CPUs don't do
>>> anything "at the same time".  I think the way this is worded now is
>>> fine; I tried coming up with a few variants of this, but none are as
>>> clear and succinct as what is there now.
>>> 
>>>> from a chain, and inserted into another chain. If new chain was empty
>>>> before the move, 'next' pointer is NULL, and lockless reader can not
>>>> detect the fact that it missed following items in original chain.
>>>> @@ -112,7 +112,12 @@ detect the fact that it missed following items in original chain.
>>>>  obj = kmem_cache_alloc(...);
>>>>  lock_chain(); // typically a spin_lock()
>>>>  obj->key = key;
>>>> -  atomic_set_release(&obj->refcnt, 1); // key before refcnt
>>>> +  /*
>>>> +  * we need to make sure obj->key is updated before obj->obj_node.next
>>>> +  * and obj->refcnt
>>>> +  */
>>>> +  smp_wmb();
>>>> +  atomic_set(&obj->refcnt, 1);
>>> 
>>> Perhaps this could be a little clearer for those of us who aren't as
>>> deep into the memory model as others ... are you saying that the
>>> atomic_set_release() would only order references to obj->refcount and
>>> would not order accesses to obj->key?  Because reading this:
>>> 
>>>    The use of ACQUIRE and RELEASE operations generally precludes the need
>>>    for other sorts of memory barrier.  In addition, a RELEASE+ACQUIRE pair is
>>>    -not- guaranteed to act as a full memory barrier.  However, after an
>>>    ACQUIRE on a given variable, all memory accesses preceding any prior
>>>    RELEASE on that same variable are guaranteed to be visible.  In other
>>>    words, within a given variable's critical section, all accesses of all
>>>    previous critical sections for that variable are guaranteed to have
>>>    completed.
>>> 
>>> makes me think that this example is fine; anybody doing a load-acquire
>>> on obj->refcount will see the update to obj->key that happened before
>>> the store-release to obj->refcount.
>> 
>> Two memory ordering required here, atomic_set_release only provides one of them (the one you mentioned)
>> 
>> The objects are allocated with SLAB_TYPESAFE_BY_RCU, and there is:
>> 
>> n->next = first;
>> 
>> in hlist_add_head_rcu, which modifies obj->obj_node.next.
> 
> But isn't that the bug?  ie this assignment should be WRITE_ONCE() or
> smp_store_release() or rcu_assign_pointer() or whichever of these
> similar functions is actually appropriate?

Without SLAB_TYPESAFE_BY_RCU, it’s fine, because it only modifies local data, which has not been published.

But with SLAB_TYPESAFE_BY_RCU, it might be a bug, but I’m not very sure.

> 
>> So, we must make sure obj->key is updated before obj->obj_node.next, without smp_wmb(), we can read
>> the new 'obj->obj_node.next’ value and previous value of 'obj->key’ at the same time, and in this case, we
>> can not detect the movement of the object.
>> 
>> The following link might be helpful:
>> 
>> https://patchwork.ozlabs.org/project/netdev/patch/491C282A.5050802@cosmosbay.com/
>> 
>> 
>>> 
>>> I am not an expert and would welcome the opportunity to learn more here.

Paul E. McKenney June 20, 2023, 10:33 p.m. UTC | #5

On Sat, Jun 17, 2023 at 02:53:46PM +0000, Alan Huang wrote:
> There are two memory ordering required in the insertion algorithm,
> we need to make sure obj->key is updated before obj->obj_node.next
> and obj->refcnt, atomic_set_release is not enough to provide the
> required memory barrier.
> 
> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
> ---
>  Documentation/RCU/rculist_nulls.rst | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst
> index e06ed40bb6..77244adbdf 100644
> --- a/Documentation/RCU/rculist_nulls.rst
> +++ b/Documentation/RCU/rculist_nulls.rst
> @@ -98,7 +98,7 @@ Quoting Corey Minyard::
>  ----------------------
>  
>  We need to make sure a reader cannot read the new 'obj->obj_node.next' value
> -and previous value of 'obj->key'. Otherwise, an item could be deleted
> +and previous value of 'obj->key' at the same time. Otherwise, an item could be deleted
>  from a chain, and inserted into another chain. If new chain was empty
>  before the move, 'next' pointer is NULL, and lockless reader can not
>  detect the fact that it missed following items in original chain.
> @@ -112,7 +112,12 @@ detect the fact that it missed following items in original chain.
>    obj = kmem_cache_alloc(...);
>    lock_chain(); // typically a spin_lock()
>    obj->key = key;
> -  atomic_set_release(&obj->refcnt, 1); // key before refcnt
> +  /*
> +  * we need to make sure obj->key is updated before obj->obj_node.next
> +  * and obj->refcnt
> +  */
> +  smp_wmb();
> +  atomic_set(&obj->refcnt, 1);

You lost me on this one.

From what I can see, in the original the atomic_set_release() provided
ordering between the ->key assignment and the ->refcnt assignment,
and then the rcu_assign_pointer() within hlist_add_head_rcu() provided
ordering between both the ->key and the ->refcnt assignments on the one
hand and the list insertion on the other.

What am I missing here?

							Thanx, Paul

>    hlist_add_head_rcu(&obj->obj_node, list);
>    unlock_chain(); // typically a spin_unlock()
>  
> -- 
> 2.34.1
>

Alan Huang June 21, 2023, 1:24 a.m. UTC | #6

> 2023年6月21日 06:33，Paul E. McKenney <paulmck@kernel.org> 写道：
> 
> On Sat, Jun 17, 2023 at 02:53:46PM +0000, Alan Huang wrote:
>> There are two memory ordering required in the insertion algorithm,
>> we need to make sure obj->key is updated before obj->obj_node.next
>> and obj->refcnt, atomic_set_release is not enough to provide the
>> required memory barrier.
>> 
>> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
>> ---
>> Documentation/RCU/rculist_nulls.rst | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>> 
>> diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst
>> index e06ed40bb6..77244adbdf 100644
>> --- a/Documentation/RCU/rculist_nulls.rst
>> +++ b/Documentation/RCU/rculist_nulls.rst
>> @@ -98,7 +98,7 @@ Quoting Corey Minyard::
>> ----------------------
>> 
>> We need to make sure a reader cannot read the new 'obj->obj_node.next' value
>> -and previous value of 'obj->key'. Otherwise, an item could be deleted
>> +and previous value of 'obj->key' at the same time. Otherwise, an item could be deleted
>> from a chain, and inserted into another chain. If new chain was empty
>> before the move, 'next' pointer is NULL, and lockless reader can not
>> detect the fact that it missed following items in original chain.
>> @@ -112,7 +112,12 @@ detect the fact that it missed following items in original chain.
>>   obj = kmem_cache_alloc(...);
>>   lock_chain(); // typically a spin_lock()
>>   obj->key = key;
>> -  atomic_set_release(&obj->refcnt, 1); // key before refcnt
>> +  /*
>> +  * we need to make sure obj->key is updated before obj->obj_node.next
>> +  * and obj->refcnt
>> +  */
>> +  smp_wmb();
>> +  atomic_set(&obj->refcnt, 1);
> 
> You lost me on this one.
> 
> From what I can see, in the original the atomic_set_release() provided
> ordering between the ->key assignment and the ->refcnt assignment,
> and then the rcu_assign_pointer() within hlist_add_head_rcu() provided
> ordering between both the ->key and the ->refcnt assignments on the one
> hand and the list insertion on the other.
> 
> What am I missing here?

The objects are allocated with SLAB_TYPESAFE_BY_RCU, and there is 
n->next = first within hlist_add_head_rcu() before rcu_assign_pointer(), which
modifies obj->obj_node.next. There may be readers holding the reference of obj in
lockless_lookup, and when updater modifies ->next, readers can see the change immediately 
because of SLAB_TYPESAFE_BY_RCU.

So, the line n->next = first is running concurrently with the line pos->next in lockless_lookup.

As you can see, after pos->next in lockless_lookup is the smp_rmb(), 
we need to make sure obj->key is updated before obj->obj_node.next (which is n->next = first in hlist_add_head_rcu).

So, the smp_wmb() is synchronized with the smp_rmb() in lockless_lookup() and the load acquire in try_get_ref().

> 
> Thanx, Paul
> 
>>   hlist_add_head_rcu(&obj->obj_node, list);
>>   unlock_chain(); // typically a spin_unlock()
>> 
>> -- 
>> 2.34.1
>>

[v2] docs/RCU: Bring smp_wmb() back

Commit Message

Comments

Patch