diff mbox series

[18/19] target/arm: Fix tsan warning in cpu.c

Message ID 20200522160755.886-19-robert.foley@linaro.org (mailing list archive)
State New, archived
Headers show
Series Add Thread Sanitizer support to QEMU | expand

Commit Message

Robert Foley May 22, 2020, 4:07 p.m. UTC
For example:
WARNING: ThreadSanitizer: data race (pid=11134)
  Atomic write of size 4 at 0x7bbc0000e0ac by main thread (mutexes: write M875):
    #0 __tsan_atomic32_store <null> (qemu-system-aarch64+0x394d84)
    #1 cpu_reset_interrupt hw/core/cpu.c:107:5 (qemu-system-aarch64+0x842f90)
    #2 arm_cpu_set_irq target/arm/cpu.c (qemu-system-aarch64+0x615a55)

  Previous read of size 4 at 0x7bbc0000e0ac by thread T7:
    #0 arm_cpu_has_work target/arm/cpu.c:78:16 (qemu-system-aarch64+0x6178ba)
    #1 cpu_has_work include/hw/core/cpu.h:700:12 (qemu-system-aarch64+0x68be2e)

Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Robert Foley <robert.foley@linaro.org>
---
 target/arm/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Peter Maydell May 22, 2020, 5:44 p.m. UTC | #1
On Fri, 22 May 2020 at 17:15, Robert Foley <robert.foley@linaro.org> wrote:
>
> For example:
> WARNING: ThreadSanitizer: data race (pid=11134)
>   Atomic write of size 4 at 0x7bbc0000e0ac by main thread (mutexes: write M875):
>     #0 __tsan_atomic32_store <null> (qemu-system-aarch64+0x394d84)
>     #1 cpu_reset_interrupt hw/core/cpu.c:107:5 (qemu-system-aarch64+0x842f90)
>     #2 arm_cpu_set_irq target/arm/cpu.c (qemu-system-aarch64+0x615a55)
>
>   Previous read of size 4 at 0x7bbc0000e0ac by thread T7:
>     #0 arm_cpu_has_work target/arm/cpu.c:78:16 (qemu-system-aarch64+0x6178ba)
>     #1 cpu_has_work include/hw/core/cpu.h:700:12 (qemu-system-aarch64+0x68be2e)
>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Cc: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Robert Foley <robert.foley@linaro.org>
> ---
>  target/arm/cpu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 32bec156f2..cdb90582ee 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -75,7 +75,7 @@ static bool arm_cpu_has_work(CPUState *cs)
>      ARMCPU *cpu = ARM_CPU(cs);
>
>      return (cpu->power_state != PSCI_OFF)
> -        && cs->interrupt_request &
> +        && atomic_read(&cs->interrupt_request) &
>          (CPU_INTERRUPT_FIQ | CPU_INTERRUPT_HARD
>           | CPU_INTERRUPT_VFIQ | CPU_INTERRUPT_VIRQ
>           | CPU_INTERRUPT_EXITTB);

Every target's has_work function seems to access
cs->interrupt_request without using atomic_read() :
why does Arm need to do something special here?

More generally, the only place that currently
uses atomic_read() on the interrupt_request field
is cpu_handle_interrupt(), so if this field needs
special precautions to access then a lot of code
needs updating.

thanks
-- PMM
Robert Foley May 22, 2020, 9:33 p.m. UTC | #2
On Fri, 22 May 2020 at 13:44, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Fri, 22 May 2020 at 17:15, Robert Foley <robert.foley@linaro.org> wrote:
> >
> > For example:
> > WARNING: ThreadSanitizer: data race (pid=11134)
> >   Atomic write of size 4 at 0x7bbc0000e0ac by main thread (mutexes: write M875):
> >     #0 __tsan_atomic32_store <null> (qemu-system-aarch64+0x394d84)
> >     #1 cpu_reset_interrupt hw/core/cpu.c:107:5 (qemu-system-aarch64+0x842f90)
> >     #2 arm_cpu_set_irq target/arm/cpu.c (qemu-system-aarch64+0x615a55)
> >
> >   Previous read of size 4 at 0x7bbc0000e0ac by thread T7:
> >     #0 arm_cpu_has_work target/arm/cpu.c:78:16 (qemu-system-aarch64+0x6178ba)
> >     #1 cpu_has_work include/hw/core/cpu.h:700:12 (qemu-system-aarch64+0x68be2e)
> >
> > Cc: Peter Maydell <peter.maydell@linaro.org>
> > Cc: Richard Henderson <richard.henderson@linaro.org>
> > Signed-off-by: Robert Foley <robert.foley@linaro.org>
> > ---
> >  target/arm/cpu.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> > index 32bec156f2..cdb90582ee 100644
> > --- a/target/arm/cpu.c
> > +++ b/target/arm/cpu.c
> > @@ -75,7 +75,7 @@ static bool arm_cpu_has_work(CPUState *cs)
> >      ARMCPU *cpu = ARM_CPU(cs);
> >
> >      return (cpu->power_state != PSCI_OFF)
> > -        && cs->interrupt_request &
> > +        && atomic_read(&cs->interrupt_request) &
> >          (CPU_INTERRUPT_FIQ | CPU_INTERRUPT_HARD
> >           | CPU_INTERRUPT_VFIQ | CPU_INTERRUPT_VIRQ
> >           | CPU_INTERRUPT_EXITTB);
>
> Every target's has_work function seems to access
> cs->interrupt_request without using atomic_read() :
> why does Arm need to do something special here?
>
> More generally, the only place that currently
> uses atomic_read() on the interrupt_request field
> is cpu_handle_interrupt(), so if this field needs
> special precautions to access then a lot of code
> needs updating.

TSan flagged this case as a potential data race. It does not mean
necessarily that there is an issue here, just that two threads were
accessing the data
without TSan detecting the synchronization.  TSan gives a few options
to silence the
warning, such as changing the locking, making it atomic, or adding
various types
of annotations to tell TSan to ignore it.  So in this case we had a
few options, such as
to change it to atomic or to simply annotate it and silence it.

We started our TSan testing using arm, and have been working to iron out the
TSan warnings there, and there alone initially.  Assuming that we are OK
with making this particular change, to silence the TSan warning,
then certainly it is a good point that we need to consider changing the
other places that access this field, since they will all see similar
TSan warnings.

Of course if we are not OK with these changes to silence the TSan tool,
that's OK too :).  In that case we can certainly just add an
annotation either in the
code or via our suppressions/blacklist and leave the code functionally
unchanged.

Thanks & Regards,
-Rob
>
> thanks
> -- PMM
Peter Maydell May 22, 2020, 10:36 p.m. UTC | #3
On Fri, 22 May 2020 at 22:33, Robert Foley <robert.foley@linaro.org> wrote:
> On Fri, 22 May 2020 at 13:44, Peter Maydell <peter.maydell@linaro.org> wrote:
> > Every target's has_work function seems to access
> > cs->interrupt_request without using atomic_read() :
> > why does Arm need to do something special here?
> >
> > More generally, the only place that currently
> > uses atomic_read() on the interrupt_request field
> > is cpu_handle_interrupt(), so if this field needs
> > special precautions to access then a lot of code
> > needs updating.
>
> TSan flagged this case as a potential data race. It does not mean
> necessarily that there is an issue here, just that two threads were
> accessing the data
> without TSan detecting the synchronization.  TSan gives a few options
> to silence the
> warning, such as changing the locking, making it atomic, or adding
> various types
> of annotations to tell TSan to ignore it.  So in this case we had a
> few options, such as
> to change it to atomic or to simply annotate it and silence it.
>
> We started our TSan testing using arm, and have been working to iron out the
> TSan warnings there, and there alone initially.  Assuming that we are OK
> with making this particular change, to silence the TSan warning,
> then certainly it is a good point that we need to consider changing the
> other places that access this field, since they will all see similar
> TSan warnings.

So is this:
 (a) a TSan false positive, because we've analysed the use
     of this struct field and know it's not a race because
     [details], but which we're choosing to silence in this way
 (b) an actual race for which the correct fix is to make the
     accesses atomic because [details]

?

Either way, the important part is the analysis which fills
in the "[details]" part, which should be in the commit message...

thanks
-- PMM
Emilio Cota May 23, 2020, 5:18 p.m. UTC | #4
On Fri, May 22, 2020 at 23:36:18 +0100, Peter Maydell wrote:
> So is this:
>  (a) a TSan false positive, because we've analysed the use
>      of this struct field and know it's not a race because
>      [details], but which we're choosing to silence in this way
>  (b) an actual race for which the correct fix is to make the
>      accesses atomic because [details]
> 
> ?

It is (b), as shown in the per-cpu lock series. In particular,
see these two patches:
- [PATCH v9 33/74] cpu: define cpu_interrupt_request helpers
https://lists.nongnu.org/archive/html/qemu-devel/2020-05/msg06322.html
- [PATCH v9 39/74] arm: convert to cpu_interrupt_request
https://lists.nongnu.org/archive/html/qemu-devel/2020-05/msg06328.html

Since a more thorough fix is included in that other series, I think this
patch should be dropped from this series -- I'll post a reply to patch
00/19 with more details.

Thanks,

		Emilio
Robert Foley May 26, 2020, 2:01 p.m. UTC | #5
On Sat, 23 May 2020 at 13:18, Emilio G. Cota <cota@braap.org> wrote:
>
> On Fri, May 22, 2020 at 23:36:18 +0100, Peter Maydell wrote:
> > So is this:
> >  (a) a TSan false positive, because we've analysed the use
> >      of this struct field and know it's not a race because
> >      [details], but which we're choosing to silence in this way
> >  (b) an actual race for which the correct fix is to make the
> >      accesses atomic because [details]
> >
> > ?
>
> It is (b), as shown in the per-cpu lock series. In particular,
> see these two patches:
> - [PATCH v9 33/74] cpu: define cpu_interrupt_request helpers
> https://lists.nongnu.org/archive/html/qemu-devel/2020-05/msg06322.html
> - [PATCH v9 39/74] arm: convert to cpu_interrupt_request
> https://lists.nongnu.org/archive/html/qemu-devel/2020-05/msg06328.html
>
> Since a more thorough fix is included in that other series, I think this
> patch should be dropped from this series -- I'll post a reply to patch
> 00/19 with more details.

I agree, we will re-focus the patch series a bit and drop this patch
from the series.

Thanks & Regards,
-Rob
> Thanks,
>
>                 Emilio
diff mbox series

Patch

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 32bec156f2..cdb90582ee 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -75,7 +75,7 @@  static bool arm_cpu_has_work(CPUState *cs)
     ARMCPU *cpu = ARM_CPU(cs);
 
     return (cpu->power_state != PSCI_OFF)
-        && cs->interrupt_request &
+        && atomic_read(&cs->interrupt_request) &
         (CPU_INTERRUPT_FIQ | CPU_INTERRUPT_HARD
          | CPU_INTERRUPT_VFIQ | CPU_INTERRUPT_VIRQ
          | CPU_INTERRUPT_EXITTB);