diff mbox series

xen/spinlock: move debug helpers inside the locked regions

Message ID 20200729111330.64549-1-roger.pau@citrix.com (mailing list archive)
State New, archived
Headers show
Series xen/spinlock: move debug helpers inside the locked regions | expand

Commit Message

Roger Pau Monne July 29, 2020, 11:13 a.m. UTC
Debug helpers such as lock profiling or the invariant pCPU assertions
must strictly be performed inside the exclusive locked region, or else
races might happen.

Note the issue was not strictly introduced by the pointed commit in
the Fixes tag, since lock stats where already incremented before the
barrier, but that commit made it more apparent as manipulating the cpu
field could happen outside of the locked regions and thus trigger the
BUG_ON. This is only enabled on debug builds, and thus releases are
not affected.

Fixes: 80cba391a35 ('spinlocks: in debug builds store cpu holding the lock')
Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/common/spinlock.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Comments

Julien Grall July 29, 2020, 1:37 p.m. UTC | #1
Hi Roger,

On 29/07/2020 12:13, Roger Pau Monne wrote:
> Debug helpers such as lock profiling or the invariant pCPU assertions
> must strictly be performed inside the exclusive locked region, or else
> races might happen.
> 
> Note the issue was not strictly introduced by the pointed commit in
> the Fixes tag, since lock stats where already incremented before the
> barrier, but that commit made it more apparent as manipulating the cpu
> field could happen outside of the locked regions and thus trigger the
> BUG_ON.

 From the wording, it is not entirely clear which BUG_ON() you are 
referring to. I am guessing, it is the one in rel_lock(). Am I correct?

Otherwise, the change looks good to me.

Cheers,

> This is only enabled on debug builds, and thus releases are
> not affected.
> 
> Fixes: 80cba391a35 ('spinlocks: in debug builds store cpu holding the lock')
> Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
>   xen/common/spinlock.c | 12 ++++++------
>   1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
> index 17f4519fc7..ce3106e2d3 100644
> --- a/xen/common/spinlock.c
> +++ b/xen/common/spinlock.c
> @@ -170,9 +170,9 @@ void inline _spin_lock_cb(spinlock_t *lock, void (*cb)(void *), void *data)
>               cb(data);
>           arch_lock_relax();
>       }
> +    arch_lock_acquire_barrier();
>       got_lock(&lock->debug);
>       LOCK_PROFILE_GOT;
> -    arch_lock_acquire_barrier();
>   }
>   
>   void _spin_lock(spinlock_t *lock)
> @@ -198,9 +198,9 @@ unsigned long _spin_lock_irqsave(spinlock_t *lock)
>   
>   void _spin_unlock(spinlock_t *lock)
>   {
> -    arch_lock_release_barrier();
>       LOCK_PROFILE_REL;
>       rel_lock(&lock->debug);
> +    arch_lock_release_barrier();
>       add_sized(&lock->tickets.head, 1);
>       arch_lock_signal();
>       preempt_enable();
> @@ -249,15 +249,15 @@ int _spin_trylock(spinlock_t *lock)
>           preempt_enable();
>           return 0;
>       }
> +    /*
> +     * cmpxchg() is a full barrier so no need for an
> +     * arch_lock_acquire_barrier().
> +     */
>       got_lock(&lock->debug);
>   #ifdef CONFIG_DEBUG_LOCK_PROFILE
>       if (lock->profile)
>           lock->profile->time_locked = NOW();
>   #endif
> -    /*
> -     * cmpxchg() is a full barrier so no need for an
> -     * arch_lock_acquire_barrier().
> -     */
>       return 1;
>   }
>   
>
Roger Pau Monne July 29, 2020, 1:50 p.m. UTC | #2
On Wed, Jul 29, 2020 at 02:37:44PM +0100, Julien Grall wrote:
> Hi Roger,
> 
> On 29/07/2020 12:13, Roger Pau Monne wrote:
> > Debug helpers such as lock profiling or the invariant pCPU assertions
> > must strictly be performed inside the exclusive locked region, or else
> > races might happen.
> > 
> > Note the issue was not strictly introduced by the pointed commit in
> > the Fixes tag, since lock stats where already incremented before the
> > barrier, but that commit made it more apparent as manipulating the cpu
> > field could happen outside of the locked regions and thus trigger the
> > BUG_ON.
> 
> From the wording, it is not entirely clear which BUG_ON() you are referring
> to. I am guessing, it is the one in rel_lock(). Am I correct?

Yes, that's right. Expanding to:

"...  and thus trigger the BUG_ON in rel_lock()." would be better.

> Otherwise, the change looks good to me.

Thanks.
Julien Grall July 29, 2020, 2:57 p.m. UTC | #3
Hi Roger,

On 29/07/2020 14:50, Roger Pau Monné wrote:
> On Wed, Jul 29, 2020 at 02:37:44PM +0100, Julien Grall wrote:
>> Hi Roger,
>>
>> On 29/07/2020 12:13, Roger Pau Monne wrote:
>>> Debug helpers such as lock profiling or the invariant pCPU assertions
>>> must strictly be performed inside the exclusive locked region, or else
>>> races might happen.
>>>
>>> Note the issue was not strictly introduced by the pointed commit in
>>> the Fixes tag, since lock stats where already incremented before the
>>> barrier, but that commit made it more apparent as manipulating the cpu
>>> field could happen outside of the locked regions and thus trigger the
>>> BUG_ON.
>>
>>  From the wording, it is not entirely clear which BUG_ON() you are referring
>> to. I am guessing, it is the one in rel_lock(). Am I correct?
> 
> Yes, that's right. Expanding to:
> 
> "...  and thus trigger the BUG_ON in rel_lock()." would be better.

Looks good to me. With that:

Reviewed-by: Julien Grall <jgrall@amazon.com>

I am happy to do the update on commit if there is no more comments.

Cheers,
Julien Grall July 30, 2020, 6:29 p.m. UTC | #4
On 29/07/2020 15:57, Julien Grall wrote:
> Hi Roger,
> 
> On 29/07/2020 14:50, Roger Pau Monné wrote:
>> On Wed, Jul 29, 2020 at 02:37:44PM +0100, Julien Grall wrote:
>>> Hi Roger,
>>>
>>> On 29/07/2020 12:13, Roger Pau Monne wrote:
>>>> Debug helpers such as lock profiling or the invariant pCPU assertions
>>>> must strictly be performed inside the exclusive locked region, or else
>>>> races might happen.
>>>>
>>>> Note the issue was not strictly introduced by the pointed commit in
>>>> the Fixes tag, since lock stats where already incremented before the
>>>> barrier, but that commit made it more apparent as manipulating the cpu
>>>> field could happen outside of the locked regions and thus trigger the
>>>> BUG_ON.
>>>
>>>  From the wording, it is not entirely clear which BUG_ON() you are 
>>> referring
>>> to. I am guessing, it is the one in rel_lock(). Am I correct?
>>
>> Yes, that's right. Expanding to:
>>
>> "...  and thus trigger the BUG_ON in rel_lock()." would be better.
> 
> Looks good to me. With that:
> 
> Reviewed-by: Julien Grall <jgrall@amazon.com>
> 
> I am happy to do the update on commit if there is no more comments.

Committed.

Thank you!

Cheers,
diff mbox series

Patch

diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
index 17f4519fc7..ce3106e2d3 100644
--- a/xen/common/spinlock.c
+++ b/xen/common/spinlock.c
@@ -170,9 +170,9 @@  void inline _spin_lock_cb(spinlock_t *lock, void (*cb)(void *), void *data)
             cb(data);
         arch_lock_relax();
     }
+    arch_lock_acquire_barrier();
     got_lock(&lock->debug);
     LOCK_PROFILE_GOT;
-    arch_lock_acquire_barrier();
 }
 
 void _spin_lock(spinlock_t *lock)
@@ -198,9 +198,9 @@  unsigned long _spin_lock_irqsave(spinlock_t *lock)
 
 void _spin_unlock(spinlock_t *lock)
 {
-    arch_lock_release_barrier();
     LOCK_PROFILE_REL;
     rel_lock(&lock->debug);
+    arch_lock_release_barrier();
     add_sized(&lock->tickets.head, 1);
     arch_lock_signal();
     preempt_enable();
@@ -249,15 +249,15 @@  int _spin_trylock(spinlock_t *lock)
         preempt_enable();
         return 0;
     }
+    /*
+     * cmpxchg() is a full barrier so no need for an
+     * arch_lock_acquire_barrier().
+     */
     got_lock(&lock->debug);
 #ifdef CONFIG_DEBUG_LOCK_PROFILE
     if (lock->profile)
         lock->profile->time_locked = NOW();
 #endif
-    /*
-     * cmpxchg() is a full barrier so no need for an
-     * arch_lock_acquire_barrier().
-     */
     return 1;
 }