RCU: reimplement RCU barrier to avoid deadlock

Message ID	1580151370-6409-1-git-send-email-igor.druzhinin@citrix.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=EHzT=3Q=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A733021739 Received-SPF: None (esa3.hc3370-68.iphmx.com: no sender authenticity information available from domain of igor.druzhinin@citrix.com) identity=pra; client-ip=162.221.158.21; receiver=esa3.hc3370-68.iphmx.com; envelope-from="igor.druzhinin@citrix.com"; x-sender="igor.druzhinin@citrix.com"; x-conformance=sidf_compatible Received-SPF: Pass (esa3.hc3370-68.iphmx.com: domain of igor.druzhinin@citrix.com designates 162.221.158.21 as permitted sender) identity=mailfrom; client-ip=162.221.158.21; receiver=esa3.hc3370-68.iphmx.com; envelope-from="igor.druzhinin@citrix.com"; x-sender="igor.druzhinin@citrix.com"; x-conformance=sidf_compatible; x-record-type="v=spf1"; x-record-text="v=spf1 ip4:209.167.231.154 ip4:178.63.86.133 ip4:195.66.111.40/30 ip4:85.115.9.32/28 ip4:199.102.83.4 ip4:192.28.146.160 ip4:192.28.146.107 ip4:216.52.6.88 ip4:216.52.6.188 ip4:162.221.158.21 ip4:162.221.156.83 ip4:168.245.78.127 ~all" Received-SPF: None (esa3.hc3370-68.iphmx.com: no sender authenticity information available from domain of postmaster@mail.citrix.com) identity=helo; client-ip=162.221.158.21; receiver=esa3.hc3370-68.iphmx.com; envelope-from="igor.druzhinin@citrix.com"; x-sender="postmaster@mail.citrix.com"; x-conformance=sidf_compatible IronPort-SDR: nk0n1KqGm7LM/fbcGK9peGFQ7c0rqJDjuPTn2ri8/L3B/WnJaX52DuwMXsp9cNE76JX4Xm3AT+ 6uB+VJDzsmdEj9t2mEEUpy+V171UrHWDJObdknz6sq4rFMqlzKLvcNSACzgMc9JMdWw3KCnIMP rthqWh7lqLgAC+ABeNsiAIeUn5J+3xMAsHuEGmp8r1pPnybDw6Nr0ZxRgeG7OJ56ZZa3lZ+Xf3 seODrtF9WVGMcKXS0c31U5pHjOZJ3y+k9k7XoebQ1QEbNLYIb+ThKnZUmG9n3fJ5OfLeghrrO/ psI= From: Igor Druzhinin <igor.druzhinin@citrix.com> To: <xen-devel@lists.xenproject.org> Date: Mon, 27 Jan 2020 18:56:10 +0000 Message-ID: <1580151370-6409-1-git-send-email-igor.druzhinin@citrix.com> MIME-Version: 1.0 Subject: [Xen-devel] [PATCH] RCU: reimplement RCU barrier to avoid deadlock Precedence: list Cc: Igor Druzhinin <igor.druzhinin@citrix.com>, sstabellini@kernel.org, julien@xen.org, wl@xen.org, konrad.wilk@oracle.com, George.Dunlap@eu.citrix.com, andrew.cooper3@citrix.com, ian.jackson@eu.citrix.com Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
Series	RCU: reimplement RCU barrier to avoid deadlock \| expand RCU: reimplement RCU barrier to avoid deadlock

Message ID

1580151370-6409-1-git-send-email-igor.druzhinin@citrix.com (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A733021739
Received-SPF: None (esa3.hc3370-68.iphmx.com: no sender
 authenticity information available from domain of
 igor.druzhinin@citrix.com) identity=pra;
 client-ip=162.221.158.21; receiver=esa3.hc3370-68.iphmx.com;
 envelope-from="igor.druzhinin@citrix.com";
 x-sender="igor.druzhinin@citrix.com";
 x-conformance=sidf_compatible
Received-SPF: Pass (esa3.hc3370-68.iphmx.com: domain of
 igor.druzhinin@citrix.com designates 162.221.158.21 as
 permitted sender) identity=mailfrom;
 client-ip=162.221.158.21; receiver=esa3.hc3370-68.iphmx.com;
 envelope-from="igor.druzhinin@citrix.com";
 x-sender="igor.druzhinin@citrix.com";
 x-conformance=sidf_compatible; x-record-type="v=spf1";
 x-record-text="v=spf1 ip4:209.167.231.154 ip4:178.63.86.133
 ip4:195.66.111.40/30 ip4:85.115.9.32/28 ip4:199.102.83.4
 ip4:192.28.146.160 ip4:192.28.146.107 ip4:216.52.6.88
 ip4:216.52.6.188 ip4:162.221.158.21 ip4:162.221.156.83
 ip4:168.245.78.127 ~all"
Received-SPF: None (esa3.hc3370-68.iphmx.com: no sender
 authenticity information available from domain of
 postmaster@mail.citrix.com) identity=helo;
 client-ip=162.221.158.21; receiver=esa3.hc3370-68.iphmx.com;
 envelope-from="igor.druzhinin@citrix.com";
 x-sender="postmaster@mail.citrix.com";
 x-conformance=sidf_compatible
IronPort-SDR: 
 nk0n1KqGm7LM/fbcGK9peGFQ7c0rqJDjuPTn2ri8/L3B/WnJaX52DuwMXsp9cNE76JX4Xm3AT+
 6uB+VJDzsmdEj9t2mEEUpy+V171UrHWDJObdknz6sq4rFMqlzKLvcNSACzgMc9JMdWw3KCnIMP
 rthqWh7lqLgAC+ABeNsiAIeUn5J+3xMAsHuEGmp8r1pPnybDw6Nr0ZxRgeG7OJ56ZZa3lZ+Xf3
 seODrtF9WVGMcKXS0c31U5pHjOZJ3y+k9k7XoebQ1QEbNLYIb+ThKnZUmG9n3fJ5OfLeghrrO/
 psI=
From: Igor Druzhinin <igor.druzhinin@citrix.com>
To: <xen-devel@lists.xenproject.org>
Date: Mon, 27 Jan 2020 18:56:10 +0000
Message-ID: <1580151370-6409-1-git-send-email-igor.druzhinin@citrix.com>
MIME-Version: 1.0
Subject: [Xen-devel] [PATCH] RCU: reimplement RCU barrier to avoid deadlock
Precedence: list
Cc: Igor
 Druzhinin <igor.druzhinin@citrix.com>, sstabellini@kernel.org,
 julien@xen.org,
 wl@xen.org, konrad.wilk@oracle.com, George.Dunlap@eu.citrix.com,
 andrew.cooper3@citrix.com, ian.jackson@eu.citrix.com
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

Series

RCU: reimplement RCU barrier to avoid deadlock | expand

Commit Message

Igor Druzhinin Jan. 27, 2020, 6:56 p.m. UTC

The existing RCU barrier implementation is prone to a deadlock scenario
due to IRQs being re-enabled inside stopmachine context. If due to a race
IRQs are re-enabled on some of CPUs and softirqs are allowed to be
processed in stopmachine, i.e. what currently happens in rcu_barrier(),
timer interrupt is able to invoke TSC synchronization rendezvous.
At this moment sending TSC synchronization IPI will stall waiting for
other CPUs to synchronize while they in turn are waiting in stopmachine
busy loop with IRQs disabled.

To avoid the scenario above - reimplement rcu_barrier() in a way where
IRQs are not being disabled at any moment. The proposed implementation
is just a simplified and specialized version of stopmachine. The semantic
of the call is preserved.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---
This change has been stress tested by doing actions invoking rcu_barrier()
functionality and didn't show any issues.
---
 xen/common/rcupdate.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

Comments

Julien Grall Jan. 28, 2020, 9:32 a.m. UTC | #1

On 27/01/2020 18:56, Igor Druzhinin wrote:
> The existing RCU barrier implementation is prone to a deadlock scenario
> due to IRQs being re-enabled inside stopmachine context. If due to a race
> IRQs are re-enabled on some of CPUs and softirqs are allowed to be
> processed in stopmachine, i.e. what currently happens in rcu_barrier(),
> timer interrupt is able to invoke TSC synchronization rendezvous.
> At this moment sending TSC synchronization IPI will stall waiting for
> other CPUs to synchronize while they in turn are waiting in stopmachine
> busy loop with IRQs disabled.
> 
> To avoid the scenario above - reimplement rcu_barrier() in a way where
> IRQs are not being disabled at any moment. The proposed implementation
> is just a simplified and specialized version of stopmachine. The semantic
> of the call is preserved.
stop_machine_run() is used in a few places within Xen. Why is this a 
problem for rcu_barrier() and not for the other callers?

>
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> ---
> This change has been stress tested by doing actions invoking rcu_barrier()
> functionality and didn't show any issues.
> ---
>   xen/common/rcupdate.c | 36 ++++++++++++++++++++++++++----------
>   1 file changed, 26 insertions(+), 10 deletions(-)
> 
> diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
> index cb712c8..95a1f85 100644
> --- a/xen/common/rcupdate.c
> +++ b/xen/common/rcupdate.c
> @@ -145,6 +145,9 @@ struct rcu_barrier_data {
>       atomic_t *cpu_count;
>   };
>   
> +static DEFINE_PER_CPU(struct tasklet, rcu_barrier_tasklet);
> +static atomic_t rcu_barrier_cpu_count, rcu_barrier_cpu_done;
> +
>   static void rcu_barrier_callback(struct rcu_head *head)
>   {
>       struct rcu_barrier_data *data = container_of(
> @@ -152,12 +155,9 @@ static void rcu_barrier_callback(struct rcu_head *head)
>       atomic_inc(data->cpu_count);
>   }
>   
> -static int rcu_barrier_action(void *_cpu_count)
> +static void rcu_barrier_action(void *unused)
>   {
> -    struct rcu_barrier_data data = { .cpu_count = _cpu_count };
> -
> -    ASSERT(!local_irq_is_enabled());
> -    local_irq_enable();
> +    struct rcu_barrier_data data = { .cpu_count = &rcu_barrier_cpu_count };
>   
>       /*
>        * When callback is executed, all previously-queued RCU work on this CPU
> @@ -172,15 +172,30 @@ static int rcu_barrier_action(void *_cpu_count)
>           cpu_relax();
>       }
>   
> -    local_irq_disable();
> -
> -    return 0;
> +    atomic_inc(&rcu_barrier_cpu_done);
>   }
>   
>   int rcu_barrier(void)
>   {

stop_machine_run() requires the interrupts to be enabled when called. Is 
this requirement still the same here? If so, can we document it and add 
an ASSERT?

> -    atomic_t cpu_count = ATOMIC_INIT(0);
> -    return stop_machine_run(rcu_barrier_action, &cpu_count, NR_CPUS);
> +    unsigned int i;
> +
> +    if ( !get_cpu_maps() )
> +        return -EBUSY;

I realize this is also present in the current implementation. However, 
nobody seems to check the return of the barrier. What would happen if 
you continue without synchronizing the RCU?

> +
> +    atomic_set(&rcu_barrier_cpu_count, 0);
> +    atomic_set(&rcu_barrier_cpu_done, 0);
> +
> +    for_each_online_cpu ( i )
> +        if ( i != smp_processor_id() )
> +            tasklet_schedule_on_cpu(&per_cpu(rcu_barrier_tasklet, i), i);
> +
> +    rcu_barrier_action(NULL);
> +
> +    while ( atomic_read(&rcu_barrier_cpu_done) != num_online_cpus() )
> +        cpu_relax();
> +
> +    put_cpu_maps();
> +    return 0;
>   }
>   
>   /* Is batch a before batch b ? */
> @@ -564,6 +579,7 @@ static void rcu_init_percpu_data(int cpu, struct rcu_ctrlblk *rcp,
>       rdp->cpu = cpu;
>       rdp->blimit = blimit;
>       init_timer(&rdp->idle_timer, rcu_idle_timer_handler, rdp, cpu);
> +    tasklet_init(&per_cpu(rcu_barrier_tasklet, cpu), rcu_barrier_action, NULL);
>   }
>   
>   static int cpu_callback(
>

Igor Druzhinin Jan. 28, 2020, 6:09 p.m. UTC | #2

On 28/01/2020 09:32, Julien Grall wrote:
> On 27/01/2020 18:56, Igor Druzhinin wrote:
>> The existing RCU barrier implementation is prone to a deadlock scenario
>> due to IRQs being re-enabled inside stopmachine context. If due to a race
>> IRQs are re-enabled on some of CPUs and softirqs are allowed to be
>> processed in stopmachine, i.e. what currently happens in rcu_barrier(),
>> timer interrupt is able to invoke TSC synchronization rendezvous.
>> At this moment sending TSC synchronization IPI will stall waiting for
>> other CPUs to synchronize while they in turn are waiting in stopmachine
>> busy loop with IRQs disabled.
>>
>> To avoid the scenario above - reimplement rcu_barrier() in a way where
>> IRQs are not being disabled at any moment. The proposed implementation
>> is just a simplified and specialized version of stopmachine. The semantic
>> of the call is preserved.
> stop_machine_run() is used in a few places within Xen. Why is this a problem for rcu_barrier() and not for the other callers?

It's true that some of them do re-enable interrupts (__cpu_disable).
The reason they are not prone to the described issue is that currently
there is no, likely, interrupt handler that might lockup the system.
Nevertheless, there are softirq handlers that do this (TSC sync) and
rcu_barrier() has to call process_pending_softirqs() inside stopmachine
context due to the nature of its implementation.

>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>> ---
>> This change has been stress tested by doing actions invoking rcu_barrier()
>> functionality and didn't show any issues.
>> ---
>>   xen/common/rcupdate.c | 36 ++++++++++++++++++++++++++----------
>>   1 file changed, 26 insertions(+), 10 deletions(-)
>>
>> diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
>> index cb712c8..95a1f85 100644
>> --- a/xen/common/rcupdate.c
>> +++ b/xen/common/rcupdate.c
>> @@ -145,6 +145,9 @@ struct rcu_barrier_data {
>>       atomic_t *cpu_count;
>>   };
>>   +static DEFINE_PER_CPU(struct tasklet, rcu_barrier_tasklet);
>> +static atomic_t rcu_barrier_cpu_count, rcu_barrier_cpu_done;
>> +
>>   static void rcu_barrier_callback(struct rcu_head *head)
>>   {
>>       struct rcu_barrier_data *data = container_of(
>> @@ -152,12 +155,9 @@ static void rcu_barrier_callback(struct rcu_head *head)
>>       atomic_inc(data->cpu_count);
>>   }
>>   -static int rcu_barrier_action(void *_cpu_count)
>> +static void rcu_barrier_action(void *unused)
>>   {
>> -    struct rcu_barrier_data data = { .cpu_count = _cpu_count };
>> -
>> -    ASSERT(!local_irq_is_enabled());
>> -    local_irq_enable();
>> +    struct rcu_barrier_data data = { .cpu_count = &rcu_barrier_cpu_count };
>>         /*
>>        * When callback is executed, all previously-queued RCU work on this CPU
>> @@ -172,15 +172,30 @@ static int rcu_barrier_action(void *_cpu_count)
>>           cpu_relax();
>>       }
>>   -    local_irq_disable();
>> -
>> -    return 0;
>> +    atomic_inc(&rcu_barrier_cpu_done);
>>   }
>>     int rcu_barrier(void)
>>   {
> 
> stop_machine_run() requires the interrupts to be enabled when called. Is this requirement still the same here? If so, can we document it and add an ASSERT?

Sure, will add.

>> -    atomic_t cpu_count = ATOMIC_INIT(0);
>> -    return stop_machine_run(rcu_barrier_action, &cpu_count, NR_CPUS);
>> +    unsigned int i;
>> +
>> +    if ( !get_cpu_maps() )
>> +        return -EBUSY;
> 
> I realize this is also present in the current implementation. However, nobody seems to check the return of the barrier. What would happen if you continue without synchronizing the RCU?

Probably a crash, as from what I saw the existing callers rely on it
finishing the task. I either need to change the semantics of the call
or fix up the callers that might be affected. I'd prefer to do
the latter in a follow up patch.

Igor

Julien Grall Jan. 29, 2020, 10:53 a.m. UTC | #3

Hi,

On 28/01/2020 18:09, Igor Druzhinin wrote:
> On 28/01/2020 09:32, Julien Grall wrote:
>> On 27/01/2020 18:56, Igor Druzhinin wrote:
>>> The existing RCU barrier implementation is prone to a deadlock scenario
>>> due to IRQs being re-enabled inside stopmachine context. If due to a race
>>> IRQs are re-enabled on some of CPUs and softirqs are allowed to be
>>> processed in stopmachine, i.e. what currently happens in rcu_barrier(),
>>> timer interrupt is able to invoke TSC synchronization rendezvous.
>>> At this moment sending TSC synchronization IPI will stall waiting for
>>> other CPUs to synchronize while they in turn are waiting in stopmachine
>>> busy loop with IRQs disabled.
>>>
>>> To avoid the scenario above - reimplement rcu_barrier() in a way where
>>> IRQs are not being disabled at any moment. The proposed implementation
>>> is just a simplified and specialized version of stopmachine. The semantic
>>> of the call is preserved.
>> stop_machine_run() is used in a few places within Xen. Why is this a problem for rcu_barrier() and not for the other callers?
> 
> It's true that some of them do re-enable interrupts (__cpu_disable).
> The reason they are not prone to the described issue is that currently
> there is no, likely, interrupt handler that might lockup the system.
> Nevertheless, there are softirq handlers that do this (TSC sync) and
> rcu_barrier() has to call process_pending_softirqs() inside stopmachine
> context due to the nature of its implementation.

Thank you for the information. It feels to me we need to document the 
expectation of stop_machine_run() so people doesn't hit the problem.

> 
>>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>>> ---
>>> This change has been stress tested by doing actions invoking rcu_barrier()
>>> functionality and didn't show any issues.
>>> ---
>>>    xen/common/rcupdate.c | 36 ++++++++++++++++++++++++++----------
>>>    1 file changed, 26 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
>>> index cb712c8..95a1f85 100644
>>> --- a/xen/common/rcupdate.c
>>> +++ b/xen/common/rcupdate.c
>>> @@ -145,6 +145,9 @@ struct rcu_barrier_data {
>>>        atomic_t *cpu_count;
>>>    };
>>>    +static DEFINE_PER_CPU(struct tasklet, rcu_barrier_tasklet);
>>> +static atomic_t rcu_barrier_cpu_count, rcu_barrier_cpu_done;
>>> +
>>>    static void rcu_barrier_callback(struct rcu_head *head)
>>>    {
>>>        struct rcu_barrier_data *data = container_of(
>>> @@ -152,12 +155,9 @@ static void rcu_barrier_callback(struct rcu_head *head)
>>>        atomic_inc(data->cpu_count);
>>>    }
>>>    -static int rcu_barrier_action(void *_cpu_count)
>>> +static void rcu_barrier_action(void *unused)
>>>    {
>>> -    struct rcu_barrier_data data = { .cpu_count = _cpu_count };
>>> -
>>> -    ASSERT(!local_irq_is_enabled());
>>> -    local_irq_enable();
>>> +    struct rcu_barrier_data data = { .cpu_count = &rcu_barrier_cpu_count };
>>>          /*
>>>         * When callback is executed, all previously-queued RCU work on this CPU
>>> @@ -172,15 +172,30 @@ static int rcu_barrier_action(void *_cpu_count)
>>>            cpu_relax();
>>>        }
>>>    -    local_irq_disable();
>>> -
>>> -    return 0;
>>> +    atomic_inc(&rcu_barrier_cpu_done);
>>>    }
>>>      int rcu_barrier(void)
>>>    {
>>
>> stop_machine_run() requires the interrupts to be enabled when called. Is this requirement still the same here? If so, can we document it and add an ASSERT?
> 
> Sure, will add.
> 
>>> -    atomic_t cpu_count = ATOMIC_INIT(0);
>>> -    return stop_machine_run(rcu_barrier_action, &cpu_count, NR_CPUS);
>>> +    unsigned int i;
>>> +
>>> +    if ( !get_cpu_maps() )
>>> +        return -EBUSY;
>>
>> I realize this is also present in the current implementation. However, nobody seems to check the return of the barrier. What would happen if you continue without synchronizing the RCU?
> 
> Probably a crash, as from what I saw the existing callers rely on it
> finishing the task. I either need to change the semantics of the call
> or fix up the callers that might be affected. I'd prefer to do
> the latter in a follow up patch.

Yes, this is a separate issue. But I thought I would point out :).

Cheers,

diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
index cb712c8..95a1f85 100644
--- a/xen/common/rcupdate.c
+++ b/xen/common/rcupdate.c
@@ -145,6 +145,9 @@  struct rcu_barrier_data {
     atomic_t *cpu_count;
 };
 
+static DEFINE_PER_CPU(struct tasklet, rcu_barrier_tasklet);
+static atomic_t rcu_barrier_cpu_count, rcu_barrier_cpu_done;
+
 static void rcu_barrier_callback(struct rcu_head *head)
 {
     struct rcu_barrier_data *data = container_of(
@@ -152,12 +155,9 @@  static void rcu_barrier_callback(struct rcu_head *head)
     atomic_inc(data->cpu_count);
 }
 
-static int rcu_barrier_action(void *_cpu_count)
+static void rcu_barrier_action(void *unused)
 {
-    struct rcu_barrier_data data = { .cpu_count = _cpu_count };
-
-    ASSERT(!local_irq_is_enabled());
-    local_irq_enable();
+    struct rcu_barrier_data data = { .cpu_count = &rcu_barrier_cpu_count };
 
     /*
      * When callback is executed, all previously-queued RCU work on this CPU
@@ -172,15 +172,30 @@  static int rcu_barrier_action(void *_cpu_count)
         cpu_relax();
     }
 
-    local_irq_disable();
-
-    return 0;
+    atomic_inc(&rcu_barrier_cpu_done);
 }
 
 int rcu_barrier(void)
 {
-    atomic_t cpu_count = ATOMIC_INIT(0);
-    return stop_machine_run(rcu_barrier_action, &cpu_count, NR_CPUS);
+    unsigned int i;
+
+    if ( !get_cpu_maps() )
+        return -EBUSY;
+
+    atomic_set(&rcu_barrier_cpu_count, 0);
+    atomic_set(&rcu_barrier_cpu_done, 0);
+
+    for_each_online_cpu ( i )
+        if ( i != smp_processor_id() )
+            tasklet_schedule_on_cpu(&per_cpu(rcu_barrier_tasklet, i), i);
+
+    rcu_barrier_action(NULL);
+
+    while ( atomic_read(&rcu_barrier_cpu_done) != num_online_cpus() )
+        cpu_relax();
+
+    put_cpu_maps();
+    return 0;
 }
 
 /* Is batch a before batch b ? */
@@ -564,6 +579,7 @@  static void rcu_init_percpu_data(int cpu, struct rcu_ctrlblk *rcp,
     rdp->cpu = cpu;
     rdp->blimit = blimit;
     init_timer(&rdp->idle_timer, rcu_idle_timer_handler, rdp, cpu);
+    tasklet_init(&per_cpu(rcu_barrier_tasklet, cpu), rcu_barrier_action, NULL);
 }
 
 static int cpu_callback(

RCU: reimplement RCU barrier to avoid deadlock

Commit Message

Comments

Patch