diff mbox series

[linux-next] powerpc: disable sanitizer in irq_soft_mask_set

Message ID 20220821010030.97539-1-zhouzhouyi@gmail.com (mailing list archive)
State New, archived
Headers show
Series [linux-next] powerpc: disable sanitizer in irq_soft_mask_set | expand

Commit Message

Zhouyi Zhou Aug. 21, 2022, 1 a.m. UTC
In ppc, compiler based sanitizer will generate instrument instructions
around statement WRITE_ONCE(local_paca->irq_soft_mask, mask):

   0xc000000000295cb0 <+0>:	addis   r2,r12,774
   0xc000000000295cb4 <+4>:	addi    r2,r2,16464
   0xc000000000295cb8 <+8>:	mflr    r0
   0xc000000000295cbc <+12>:	bl      0xc00000000008bb4c <mcount>
   0xc000000000295cc0 <+16>:	mflr    r0
   0xc000000000295cc4 <+20>:	std     r31,-8(r1)
   0xc000000000295cc8 <+24>:	addi    r3,r13,2354
   0xc000000000295ccc <+28>:	mr      r31,r13
   0xc000000000295cd0 <+32>:	std     r0,16(r1)
   0xc000000000295cd4 <+36>:	stdu    r1,-48(r1)
   0xc000000000295cd8 <+40>:	bl      0xc000000000609b98 <__asan_store1+8>
   0xc000000000295cdc <+44>:	nop
   0xc000000000295ce0 <+48>:	li      r9,1
   0xc000000000295ce4 <+52>:	stb     r9,2354(r31)
   0xc000000000295ce8 <+56>:	addi    r1,r1,48
   0xc000000000295cec <+60>:	ld      r0,16(r1)
   0xc000000000295cf0 <+64>:	ld      r31,-8(r1)
   0xc000000000295cf4 <+68>:	mtlr    r0

If there is a context switch before "stb     r9,2354(r31)", r31 may
not equal to r13, in such case, irq soft mask will not work.

This patch disable sanitizer in irq_soft_mask_set.

Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
---
Dear PPC developers

I found this bug when trying to do rcutorture tests in ppc VM of
Open Source Lab of Oregon State University following Paul E. McKenny's guidance.

console.log report following bug:

[  346.527467][  T100] BUG: using smp_processor_id() in preemptible [00000000] code: rcu_torture_rea/100^M
[  346.529416][  T100] caller is rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
[  346.531157][  T100] CPU: 4 PID: 100 Comm: rcu_torture_rea Tainted: G        W          5.19.0-rc5-next-20220708-dirty #253^M
[  346.533620][  T100] Call Trace:^M
[  346.534449][  T100] [c0000000094876c0] [c000000000ce2b68] dump_stack_lvl+0xbc/0x108 (unreliable)^M
[  346.536632][  T100] [c000000009487710] [c000000001712954] check_preemption_disabled+0x154/0x160^M
[  346.538665][  T100] [c0000000094877a0] [c0000000002ce2d4] rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
[  346.540830][  T100] [c0000000094878b0] [c0000000002cf3c0] __rcu_read_unlock+0x290/0x3b0^M
[  346.542746][  T100] [c000000009487910] [c0000000002bb330] rcu_torture_read_unlock+0x30/0xb0^M
[  346.544779][  T100] [c000000009487930] [c0000000002b7ff8] rcutorture_one_extend+0x198/0x810^M
[  346.546851][  T100] [c000000009487a10] [c0000000002b8bfc] rcu_torture_one_read+0x58c/0xc90^M
[  346.548844][  T100] [c000000009487ca0] [c0000000002b942c] rcu_torture_reader+0x12c/0x360^M
[  346.550784][  T100] [c000000009487db0] [c0000000001de978] kthread+0x1e8/0x220^M
[  346.552555][  T100] [c000000009487e10] [c00000000000cd54] ret_from_kernel_thread+0x5c/0x64^M

After 12 days debugging, I finally narrow the problem to irq_soft_mask_set.

I am a beginner, hope I can be of some beneficial to the community ;-)

Thanks
Zhouyi
--
 arch/powerpc/include/asm/hw_irq.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Christophe Leroy Aug. 22, 2022, 6:04 a.m. UTC | #1
Le 21/08/2022 à 03:00, Zhouyi Zhou a écrit :
> In ppc, compiler based sanitizer will generate instrument instructions
> around statement WRITE_ONCE(local_paca->irq_soft_mask, mask):
> 
>     0xc000000000295cb0 <+0>:	addis   r2,r12,774
>     0xc000000000295cb4 <+4>:	addi    r2,r2,16464
>     0xc000000000295cb8 <+8>:	mflr    r0
>     0xc000000000295cbc <+12>:	bl      0xc00000000008bb4c <mcount>
>     0xc000000000295cc0 <+16>:	mflr    r0
>     0xc000000000295cc4 <+20>:	std     r31,-8(r1)
>     0xc000000000295cc8 <+24>:	addi    r3,r13,2354
>     0xc000000000295ccc <+28>:	mr      r31,r13
>     0xc000000000295cd0 <+32>:	std     r0,16(r1)
>     0xc000000000295cd4 <+36>:	stdu    r1,-48(r1)
>     0xc000000000295cd8 <+40>:	bl      0xc000000000609b98 <__asan_store1+8>
>     0xc000000000295cdc <+44>:	nop
>     0xc000000000295ce0 <+48>:	li      r9,1
>     0xc000000000295ce4 <+52>:	stb     r9,2354(r31)
>     0xc000000000295ce8 <+56>:	addi    r1,r1,48
>     0xc000000000295cec <+60>:	ld      r0,16(r1)
>     0xc000000000295cf0 <+64>:	ld      r31,-8(r1)
>     0xc000000000295cf4 <+68>:	mtlr    r0
> 
> If there is a context switch before "stb     r9,2354(r31)", r31 may
> not equal to r13, in such case, irq soft mask will not work.
> 
> This patch disable sanitizer in irq_soft_mask_set.

Well spotted, thanks.

You should add:

Fixes: ef5b570d3700 ("powerpc/irq: Don't open code irq_soft_mask helpers")

By the way, I think the READ_ONCE() likely has the same issue so you 
should fix irq_soft_mask_return() at the same time.

> 
> Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
> ---
> Dear PPC developers
> 
> I found this bug when trying to do rcutorture tests in ppc VM of
> Open Source Lab of Oregon State University following Paul E. McKenny's guidance.
> 
> console.log report following bug:
> 
> [  346.527467][  T100] BUG: using smp_processor_id() in preemptible [00000000] code: rcu_torture_rea/100^M
> [  346.529416][  T100] caller is rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
> [  346.531157][  T100] CPU: 4 PID: 100 Comm: rcu_torture_rea Tainted: G        W          5.19.0-rc5-next-20220708-dirty #253^M
> [  346.533620][  T100] Call Trace:^M
> [  346.534449][  T100] [c0000000094876c0] [c000000000ce2b68] dump_stack_lvl+0xbc/0x108 (unreliable)^M
> [  346.536632][  T100] [c000000009487710] [c000000001712954] check_preemption_disabled+0x154/0x160^M
> [  346.538665][  T100] [c0000000094877a0] [c0000000002ce2d4] rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
> [  346.540830][  T100] [c0000000094878b0] [c0000000002cf3c0] __rcu_read_unlock+0x290/0x3b0^M
> [  346.542746][  T100] [c000000009487910] [c0000000002bb330] rcu_torture_read_unlock+0x30/0xb0^M
> [  346.544779][  T100] [c000000009487930] [c0000000002b7ff8] rcutorture_one_extend+0x198/0x810^M
> [  346.546851][  T100] [c000000009487a10] [c0000000002b8bfc] rcu_torture_one_read+0x58c/0xc90^M
> [  346.548844][  T100] [c000000009487ca0] [c0000000002b942c] rcu_torture_reader+0x12c/0x360^M
> [  346.550784][  T100] [c000000009487db0] [c0000000001de978] kthread+0x1e8/0x220^M
> [  346.552555][  T100] [c000000009487e10] [c00000000000cd54] ret_from_kernel_thread+0x5c/0x64^M
> 
> After 12 days debugging, I finally narrow the problem to irq_soft_mask_set.
> 
> I am a beginner, hope I can be of some beneficial to the community ;-)
> 
> Thanks
> Zhouyi
> --
>   arch/powerpc/include/asm/hw_irq.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> index 26ede09c521d..a5ae8d82cc9d 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -121,7 +121,7 @@ static inline notrace unsigned long irq_soft_mask_return(void)
>    * for the critical section and as a clobber because
>    * we changed paca->irq_soft_mask
>    */
> -static inline notrace void irq_soft_mask_set(unsigned long mask)
> +static inline notrace __no_kcsan __no_sanitize_address void irq_soft_mask_set(unsigned long mask)
>   {
>   	/*
>   	 * The irq mask must always include the STD bit if any are set.
Zhouyi Zhou Aug. 23, 2022, 1:43 a.m. UTC | #2
On Mon, Aug 22, 2022 at 2:04 PM Christophe Leroy
<christophe.leroy@csgroup.eu> wrote:
>
>
>
> Le 21/08/2022 à 03:00, Zhouyi Zhou a écrit :
> > In ppc, compiler based sanitizer will generate instrument instructions
> > around statement WRITE_ONCE(local_paca->irq_soft_mask, mask):
> >
> >     0xc000000000295cb0 <+0>:  addis   r2,r12,774
> >     0xc000000000295cb4 <+4>:  addi    r2,r2,16464
> >     0xc000000000295cb8 <+8>:  mflr    r0
> >     0xc000000000295cbc <+12>: bl      0xc00000000008bb4c <mcount>
> >     0xc000000000295cc0 <+16>: mflr    r0
> >     0xc000000000295cc4 <+20>: std     r31,-8(r1)
> >     0xc000000000295cc8 <+24>: addi    r3,r13,2354
> >     0xc000000000295ccc <+28>: mr      r31,r13
> >     0xc000000000295cd0 <+32>: std     r0,16(r1)
> >     0xc000000000295cd4 <+36>: stdu    r1,-48(r1)
> >     0xc000000000295cd8 <+40>: bl      0xc000000000609b98 <__asan_store1+8>
> >     0xc000000000295cdc <+44>: nop
> >     0xc000000000295ce0 <+48>: li      r9,1
> >     0xc000000000295ce4 <+52>: stb     r9,2354(r31)
> >     0xc000000000295ce8 <+56>: addi    r1,r1,48
> >     0xc000000000295cec <+60>: ld      r0,16(r1)
> >     0xc000000000295cf0 <+64>: ld      r31,-8(r1)
> >     0xc000000000295cf4 <+68>: mtlr    r0
> >
> > If there is a context switch before "stb     r9,2354(r31)", r31 may
> > not equal to r13, in such case, irq soft mask will not work.
> >
> > This patch disable sanitizer in irq_soft_mask_set.
>
> Well spotted, thanks.
Thank Christophe for reviewing my patch and your guidance!
>
> You should add:
>
> Fixes: ef5b570d3700 ("powerpc/irq: Don't open code irq_soft_mask helpers")
OK, I will do it!
>
> By the way, I think the READ_ONCE() likely has the same issue so you
> should fix irq_soft_mask_return() at the same time.
Yes, after disassembling irq_soft_mask_return, she has the same issue
as irq_soft_mask_set.

In addition, I read hw_irq.h by naked eye to search for removed inline
assembly one by one,
I found another place that we could possible enhance (Thank Paul E.
McKenny for teaching me use git blame ;-)):
077fc62b2b66a("powerpc/irq: remove inline assembly in hard_irq_disable macro")
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -282,9 +282,7 @@ static inline bool pmi_irq_pending(void)
        flags = irq_soft_mask_set_return(IRQS_ALL_DISABLED);            \
        local_paca->irq_happened |= PACA_IRQ_HARD_DIS;                  \
        if (!arch_irqs_disabled_flags(flags)) {                         \
-               asm ("stdx %%r1, 0, %1 ;"                               \
-                    : "=m" (local_paca->saved_r1)                      \
-                    : "b" (&local_paca->saved_r1));                    \
+               WRITE_ONCE(local_paca->saved_r1, current_stack_pointer);\
                trace_hardirqs_off();                                   \
        }                                                               \
 } while(0)
I wrap the macro hard_irq_disable into a test function and disassemble
it, she has the above issue too:
(gdb) disassemble test002
Dump of assembler code for function test002:
   0xc000000000295db0 <+0>:    addis   r2,r12,774
   0xc000000000295db4 <+4>:    addi    r2,r2,16464
   0xc000000000295db8 <+8>:    mflr    r0
   0xc000000000295dbc <+12>:    bl      0xc00000000008bacc <mcount>
   0xc000000000295dc0 <+16>:    mflr    r0
   0xc000000000295dc4 <+20>:    std     r30,-16(r1)
   0xc000000000295dc8 <+24>:    std     r31,-8(r1)
   0xc000000000295dcc <+28>:    li      r9,2
   0xc000000000295dd0 <+32>:    std     r0,16(r1)
   0xc000000000295dd4 <+36>:    stdu    r1,-48(r1)
   0xc000000000295dd8 <+40>:    mtmsrd  r9,1
   0xc000000000295ddc <+44>:    addi    r3,r13,2354
   0xc000000000295de0 <+48>:    mr      r31,r13
   0xc000000000295de4 <+52>:    bl      0xc000000000609838 <__asan_load1+8>
   0xc000000000295de8 <+56>:    nop
   0xc000000000295dec <+60>:    li      r3,3
   0xc000000000295df0 <+64>:    lbz     r30,2354(r31)
   0xc000000000295df4 <+68>:    bl      0xc00000000028de90 <irq_soft_mask_set>
   0xc000000000295df8 <+72>:    mr      r31,r13
   0xc000000000295dfc <+76>:    addi    r3,r13,2355
   0xc000000000295e00 <+80>:    bl      0xc000000000609838 <__asan_load1+8>
   0xc000000000295e04 <+84>:    nop
   0xc000000000295e08 <+88>:    lbz     r9,2355(r31)
   0xc000000000295e0c <+92>:    andi.   r10,r30,1
   0xc000000000295e10 <+96>:    ori     r9,r9,1
   0xc000000000295e14 <+100>:    stb     r9,2355(r31)
   0xc000000000295e18 <+104>:    bne     0xc000000000295e30 <test002+128>
   0xc000000000295e1c <+108>:    mr      r30,r1
   0xc000000000295e20 <+112>:    addi    r3,r31,2328
   0xc000000000295e24 <+116>:    bl      0xc000000000609dd8 <__asan_store8+8>
   0xc000000000295e28 <+120>:    nop
   0xc000000000295e2c <+124>:    std     r30,2328(r31)
   0xc000000000295e30 <+128>:    addi    r1,r1,48
   0xc000000000295e34 <+132>:    ld      r0,16(r1)
   0xc000000000295e38 <+136>:    ld      r30,-16(r1)
   0xc000000000295e3c <+140>:    ld      r31,-8(r1)
   0xc000000000295e40 <+144>:    mtlr    r0
   0xc000000000295e44 <+148>:    blr
Could we rewrite this macro into a static inline function as
irq_soft_mask_set does, and disable sanitizer for it?

Thanks again
Cheers
Zhouyi
>
> >
> > Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
> > ---
> > Dear PPC developers
> >
> > I found this bug when trying to do rcutorture tests in ppc VM of
> > Open Source Lab of Oregon State University following Paul E. McKenny's guidance.
> >
> > console.log report following bug:
> >
> > [  346.527467][  T100] BUG: using smp_processor_id() in preemptible [00000000] code: rcu_torture_rea/100^M
> > [  346.529416][  T100] caller is rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
> > [  346.531157][  T100] CPU: 4 PID: 100 Comm: rcu_torture_rea Tainted: G        W          5.19.0-rc5-next-20220708-dirty #253^M
> > [  346.533620][  T100] Call Trace:^M
> > [  346.534449][  T100] [c0000000094876c0] [c000000000ce2b68] dump_stack_lvl+0xbc/0x108 (unreliable)^M
> > [  346.536632][  T100] [c000000009487710] [c000000001712954] check_preemption_disabled+0x154/0x160^M
> > [  346.538665][  T100] [c0000000094877a0] [c0000000002ce2d4] rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
> > [  346.540830][  T100] [c0000000094878b0] [c0000000002cf3c0] __rcu_read_unlock+0x290/0x3b0^M
> > [  346.542746][  T100] [c000000009487910] [c0000000002bb330] rcu_torture_read_unlock+0x30/0xb0^M
> > [  346.544779][  T100] [c000000009487930] [c0000000002b7ff8] rcutorture_one_extend+0x198/0x810^M
> > [  346.546851][  T100] [c000000009487a10] [c0000000002b8bfc] rcu_torture_one_read+0x58c/0xc90^M
> > [  346.548844][  T100] [c000000009487ca0] [c0000000002b942c] rcu_torture_reader+0x12c/0x360^M
> > [  346.550784][  T100] [c000000009487db0] [c0000000001de978] kthread+0x1e8/0x220^M
> > [  346.552555][  T100] [c000000009487e10] [c00000000000cd54] ret_from_kernel_thread+0x5c/0x64^M
> >
> > After 12 days debugging, I finally narrow the problem to irq_soft_mask_set.
> >
> > I am a beginner, hope I can be of some beneficial to the community ;-)
> >
> > Thanks
> > Zhouyi
> > --
> >   arch/powerpc/include/asm/hw_irq.h | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> > index 26ede09c521d..a5ae8d82cc9d 100644
> > --- a/arch/powerpc/include/asm/hw_irq.h
> > +++ b/arch/powerpc/include/asm/hw_irq.h
> > @@ -121,7 +121,7 @@ static inline notrace unsigned long irq_soft_mask_return(void)
> >    * for the critical section and as a clobber because
> >    * we changed paca->irq_soft_mask
> >    */
> > -static inline notrace void irq_soft_mask_set(unsigned long mask)
> > +static inline notrace __no_kcsan __no_sanitize_address void irq_soft_mask_set(unsigned long mask)
> >   {
> >       /*
> >        * The irq mask must always include the STD bit if any are set.
Christophe Leroy Aug. 23, 2022, 5:53 a.m. UTC | #3
Le 23/08/2022 à 03:43, Zhouyi Zhou a écrit :
> On Mon, Aug 22, 2022 at 2:04 PM Christophe Leroy
> <christophe.leroy@csgroup.eu> wrote:
>>
>>
>>
>> Le 21/08/2022 à 03:00, Zhouyi Zhou a écrit :
>>> In ppc, compiler based sanitizer will generate instrument instructions
>>> around statement WRITE_ONCE(local_paca->irq_soft_mask, mask):
>>>
>>>      0xc000000000295cb0 <+0>:  addis   r2,r12,774
>>>      0xc000000000295cb4 <+4>:  addi    r2,r2,16464
>>>      0xc000000000295cb8 <+8>:  mflr    r0
>>>      0xc000000000295cbc <+12>: bl      0xc00000000008bb4c <mcount>
>>>      0xc000000000295cc0 <+16>: mflr    r0
>>>      0xc000000000295cc4 <+20>: std     r31,-8(r1)
>>>      0xc000000000295cc8 <+24>: addi    r3,r13,2354
>>>      0xc000000000295ccc <+28>: mr      r31,r13
>>>      0xc000000000295cd0 <+32>: std     r0,16(r1)
>>>      0xc000000000295cd4 <+36>: stdu    r1,-48(r1)
>>>      0xc000000000295cd8 <+40>: bl      0xc000000000609b98 <__asan_store1+8>
>>>      0xc000000000295cdc <+44>: nop
>>>      0xc000000000295ce0 <+48>: li      r9,1
>>>      0xc000000000295ce4 <+52>: stb     r9,2354(r31)
>>>      0xc000000000295ce8 <+56>: addi    r1,r1,48
>>>      0xc000000000295cec <+60>: ld      r0,16(r1)
>>>      0xc000000000295cf0 <+64>: ld      r31,-8(r1)
>>>      0xc000000000295cf4 <+68>: mtlr    r0
>>>
>>> If there is a context switch before "stb     r9,2354(r31)", r31 may
>>> not equal to r13, in such case, irq soft mask will not work.
>>>
>>> This patch disable sanitizer in irq_soft_mask_set.
>>
>> Well spotted, thanks.
> Thank Christophe for reviewing my patch and your guidance!
>>
>> You should add:
>>
>> Fixes: ef5b570d3700 ("powerpc/irq: Don't open code irq_soft_mask helpers")
> OK, I will do it!
>>
>> By the way, I think the READ_ONCE() likely has the same issue so you
>> should fix irq_soft_mask_return() at the same time.
> Yes, after disassembling irq_soft_mask_return, she has the same issue
> as irq_soft_mask_set.
> 
> In addition, I read hw_irq.h by naked eye to search for removed inline
> assembly one by one,
> I found another place that we could possible enhance (Thank Paul E.
> McKenny for teaching me use git blame ;-)):
> 077fc62b2b66a("powerpc/irq: remove inline assembly in hard_irq_disable macro")
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -282,9 +282,7 @@ static inline bool pmi_irq_pending(void)
>          flags = irq_soft_mask_set_return(IRQS_ALL_DISABLED);            \
>          local_paca->irq_happened |= PACA_IRQ_HARD_DIS;                  \
>          if (!arch_irqs_disabled_flags(flags)) {                         \
> -               asm ("stdx %%r1, 0, %1 ;"                               \
> -                    : "=m" (local_paca->saved_r1)                      \
> -                    : "b" (&local_paca->saved_r1));                    \
> +               WRITE_ONCE(local_paca->saved_r1, current_stack_pointer);\
>                  trace_hardirqs_off();                                   \
>          }                                                               \
>   } while(0)
> I wrap the macro hard_irq_disable into a test function and disassemble
> it, she has the above issue too:
> (gdb) disassemble test002
> Dump of assembler code for function test002:
...
> Could we rewrite this macro into a static inline function as
> irq_soft_mask_set does, and disable sanitizer for it?

Yes we could try to do that, hoping it will not bring too much troubles 
with circular header inclusion.

Another possible approach could be to create a WRITE_ONCE_NOCHECK() more 
or less similar to existing READ_ONCE_NOCHECK().

We could also change READ_ONCE_NOCHECK() to accept bytes size then use 
it for irq_soft_mask_return() .

Christophe
Michael Ellerman Aug. 23, 2022, 8:33 a.m. UTC | #4
Zhouyi Zhou <zhouzhouyi@gmail.com> writes:
> In ppc, compiler based sanitizer will generate instrument instructions
> around statement WRITE_ONCE(local_paca->irq_soft_mask, mask):
>
>    0xc000000000295cb0 <+0>:	addis   r2,r12,774
>    0xc000000000295cb4 <+4>:	addi    r2,r2,16464
>    0xc000000000295cb8 <+8>:	mflr    r0
>    0xc000000000295cbc <+12>:	bl      0xc00000000008bb4c <mcount>
>    0xc000000000295cc0 <+16>:	mflr    r0
>    0xc000000000295cc4 <+20>:	std     r31,-8(r1)
>    0xc000000000295cc8 <+24>:	addi    r3,r13,2354
>    0xc000000000295ccc <+28>:	mr      r31,r13
>    0xc000000000295cd0 <+32>:	std     r0,16(r1)
>    0xc000000000295cd4 <+36>:	stdu    r1,-48(r1)
>    0xc000000000295cd8 <+40>:	bl      0xc000000000609b98 <__asan_store1+8>
>    0xc000000000295cdc <+44>:	nop
>    0xc000000000295ce0 <+48>:	li      r9,1
>    0xc000000000295ce4 <+52>:	stb     r9,2354(r31)
>    0xc000000000295ce8 <+56>:	addi    r1,r1,48
>    0xc000000000295cec <+60>:	ld      r0,16(r1)
>    0xc000000000295cf0 <+64>:	ld      r31,-8(r1)
>    0xc000000000295cf4 <+68>:	mtlr    r0
>
> If there is a context switch before "stb     r9,2354(r31)", r31 may
> not equal to r13, in such case, irq soft mask will not work.
>
> This patch disable sanitizer in irq_soft_mask_set.
>
> Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
> ---
> Dear PPC developers
>
> I found this bug when trying to do rcutorture tests in ppc VM of
> Open Source Lab of Oregon State University following Paul E. McKenny's guidance.
>
> console.log report following bug:
>
> [  346.527467][  T100] BUG: using smp_processor_id() in preemptible [00000000] code: rcu_torture_rea/100^M
> [  346.529416][  T100] caller is rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
> [  346.531157][  T100] CPU: 4 PID: 100 Comm: rcu_torture_rea Tainted: G        W          5.19.0-rc5-next-20220708-dirty #253^M
> [  346.533620][  T100] Call Trace:^M
> [  346.534449][  T100] [c0000000094876c0] [c000000000ce2b68] dump_stack_lvl+0xbc/0x108 (unreliable)^M
> [  346.536632][  T100] [c000000009487710] [c000000001712954] check_preemption_disabled+0x154/0x160^M
> [  346.538665][  T100] [c0000000094877a0] [c0000000002ce2d4] rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
> [  346.540830][  T100] [c0000000094878b0] [c0000000002cf3c0] __rcu_read_unlock+0x290/0x3b0^M
> [  346.542746][  T100] [c000000009487910] [c0000000002bb330] rcu_torture_read_unlock+0x30/0xb0^M
> [  346.544779][  T100] [c000000009487930] [c0000000002b7ff8] rcutorture_one_extend+0x198/0x810^M
> [  346.546851][  T100] [c000000009487a10] [c0000000002b8bfc] rcu_torture_one_read+0x58c/0xc90^M
> [  346.548844][  T100] [c000000009487ca0] [c0000000002b942c] rcu_torture_reader+0x12c/0x360^M
> [  346.550784][  T100] [c000000009487db0] [c0000000001de978] kthread+0x1e8/0x220^M
> [  346.552555][  T100] [c000000009487e10] [c00000000000cd54] ret_from_kernel_thread+0x5c/0x64^M
>
> After 12 days debugging, I finally narrow the problem to irq_soft_mask_set.

Thanks for spending 12 days debugging it! O_o

> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> index 26ede09c521d..a5ae8d82cc9d 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -121,7 +121,7 @@ static inline notrace unsigned long irq_soft_mask_return(void)
>   * for the critical section and as a clobber because
>   * we changed paca->irq_soft_mask
>   */
> -static inline notrace void irq_soft_mask_set(unsigned long mask)
> +static inline notrace __no_kcsan __no_sanitize_address void irq_soft_mask_set(unsigned long mask)
>  {
>  	/*
>  	 * The irq mask must always include the STD bit if any are set.

My worry is that this will force irq_soft_mask_set() out of line, which
we would rather avoid. It's meant to be a fast path.

In fact with this applied I see nearly 300 out-of-line copies of the
function when building a defconfig, and ~1700 calls to it.

Normally it is inlined at every call site.


So I think I'm inclined to revert ef5b570d3700 ("powerpc/irq: Don't open
code irq_soft_mask helpers").

It was a nice looking cleanup, but those loads must not be instrumented
by KASAN, but we also want them inlined, and AFAICS the only way to
achieve that is to go back to inline asm.

cheers
Christophe Leroy Aug. 23, 2022, 8:47 a.m. UTC | #5
Le 23/08/2022 à 10:33, Michael Ellerman a écrit :
> Zhouyi Zhou <zhouzhouyi@gmail.com> writes:
>> In ppc, compiler based sanitizer will generate instrument instructions
>> around statement WRITE_ONCE(local_paca->irq_soft_mask, mask):
>>
>>     0xc000000000295cb0 <+0>:	addis   r2,r12,774
>>     0xc000000000295cb4 <+4>:	addi    r2,r2,16464
>>     0xc000000000295cb8 <+8>:	mflr    r0
>>     0xc000000000295cbc <+12>:	bl      0xc00000000008bb4c <mcount>
>>     0xc000000000295cc0 <+16>:	mflr    r0
>>     0xc000000000295cc4 <+20>:	std     r31,-8(r1)
>>     0xc000000000295cc8 <+24>:	addi    r3,r13,2354
>>     0xc000000000295ccc <+28>:	mr      r31,r13
>>     0xc000000000295cd0 <+32>:	std     r0,16(r1)
>>     0xc000000000295cd4 <+36>:	stdu    r1,-48(r1)
>>     0xc000000000295cd8 <+40>:	bl      0xc000000000609b98 <__asan_store1+8>
>>     0xc000000000295cdc <+44>:	nop
>>     0xc000000000295ce0 <+48>:	li      r9,1
>>     0xc000000000295ce4 <+52>:	stb     r9,2354(r31)
>>     0xc000000000295ce8 <+56>:	addi    r1,r1,48
>>     0xc000000000295cec <+60>:	ld      r0,16(r1)
>>     0xc000000000295cf0 <+64>:	ld      r31,-8(r1)
>>     0xc000000000295cf4 <+68>:	mtlr    r0
>>
>> If there is a context switch before "stb     r9,2354(r31)", r31 may
>> not equal to r13, in such case, irq soft mask will not work.
>>
>> This patch disable sanitizer in irq_soft_mask_set.
>>
>> Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
>> ---
>> Dear PPC developers
>>
>> I found this bug when trying to do rcutorture tests in ppc VM of
>> Open Source Lab of Oregon State University following Paul E. McKenny's guidance.
>>
>> console.log report following bug:
>>
>> [  346.527467][  T100] BUG: using smp_processor_id() in preemptible [00000000] code: rcu_torture_rea/100^M
>> [  346.529416][  T100] caller is rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
>> [  346.531157][  T100] CPU: 4 PID: 100 Comm: rcu_torture_rea Tainted: G        W          5.19.0-rc5-next-20220708-dirty #253^M
>> [  346.533620][  T100] Call Trace:^M
>> [  346.534449][  T100] [c0000000094876c0] [c000000000ce2b68] dump_stack_lvl+0xbc/0x108 (unreliable)^M
>> [  346.536632][  T100] [c000000009487710] [c000000001712954] check_preemption_disabled+0x154/0x160^M
>> [  346.538665][  T100] [c0000000094877a0] [c0000000002ce2d4] rcu_preempt_deferred_qs_irqrestore+0x74/0xed0^M
>> [  346.540830][  T100] [c0000000094878b0] [c0000000002cf3c0] __rcu_read_unlock+0x290/0x3b0^M
>> [  346.542746][  T100] [c000000009487910] [c0000000002bb330] rcu_torture_read_unlock+0x30/0xb0^M
>> [  346.544779][  T100] [c000000009487930] [c0000000002b7ff8] rcutorture_one_extend+0x198/0x810^M
>> [  346.546851][  T100] [c000000009487a10] [c0000000002b8bfc] rcu_torture_one_read+0x58c/0xc90^M
>> [  346.548844][  T100] [c000000009487ca0] [c0000000002b942c] rcu_torture_reader+0x12c/0x360^M
>> [  346.550784][  T100] [c000000009487db0] [c0000000001de978] kthread+0x1e8/0x220^M
>> [  346.552555][  T100] [c000000009487e10] [c00000000000cd54] ret_from_kernel_thread+0x5c/0x64^M
>>
>> After 12 days debugging, I finally narrow the problem to irq_soft_mask_set.
> 
> Thanks for spending 12 days debugging it! O_o
> 
>> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
>> index 26ede09c521d..a5ae8d82cc9d 100644
>> --- a/arch/powerpc/include/asm/hw_irq.h
>> +++ b/arch/powerpc/include/asm/hw_irq.h
>> @@ -121,7 +121,7 @@ static inline notrace unsigned long irq_soft_mask_return(void)
>>    * for the critical section and as a clobber because
>>    * we changed paca->irq_soft_mask
>>    */
>> -static inline notrace void irq_soft_mask_set(unsigned long mask)
>> +static inline notrace __no_kcsan __no_sanitize_address void irq_soft_mask_set(unsigned long mask)
>>   {
>>   	/*
>>   	 * The irq mask must always include the STD bit if any are set.
> 
> My worry is that this will force irq_soft_mask_set() out of line, which
> we would rather avoid. It's meant to be a fast path.
> 
> In fact with this applied I see nearly 300 out-of-line copies of the
> function when building a defconfig, and ~1700 calls to it.
> 
> Normally it is inlined at every call site.
> 
> 
> So I think I'm inclined to revert ef5b570d3700 ("powerpc/irq: Don't open
> code irq_soft_mask helpers").

Could you revert it only partially ? In extenso, revert the 
READ/WRITE_ONCE and bring back the inline asm in irq_soft_mask_return() 
  and irq_soft_mask_set(), but keep other changes.

> 
> It was a nice looking cleanup, but those loads must not be instrumented
> by KASAN, but we also want them inlined, and AFAICS the only way to
> achieve that is to go back to inline asm.
> 

It's a pitty.

Would it be acceptable to have it out of line when KASAN is selected and 
inline otherwise ? In that case there is __no_sanitize_or_inline

Christophe
Christophe Leroy Aug. 23, 2022, 4:50 p.m. UTC | #6
Le 23/08/2022 à 10:47, Christophe Leroy a écrit :
> 
> 
> Le 23/08/2022 à 10:33, Michael Ellerman a écrit :
>> Zhouyi Zhou <zhouzhouyi@gmail.com> writes:
>>
>> My worry is that this will force irq_soft_mask_set() out of line, which
>> we would rather avoid. It's meant to be a fast path.
>>
>> In fact with this applied I see nearly 300 out-of-line copies of the
>> function when building a defconfig, and ~1700 calls to it.
>>
>> Normally it is inlined at every call site.
>>
>>
>> So I think I'm inclined to revert ef5b570d3700 ("powerpc/irq: Don't open
>> code irq_soft_mask helpers").
> 
> Could you revert it only partially ? In extenso, revert the 
> READ/WRITE_ONCE and bring back the inline asm in irq_soft_mask_return() 
>   and irq_soft_mask_set(), but keep other changes.

I sent a patch doing that.

Christophe
Zhouyi Zhou Aug. 24, 2022, 1:25 a.m. UTC | #7
On Wed, Aug 24, 2022 at 12:50 AM Christophe Leroy
<christophe.leroy@csgroup.eu> wrote:
>
>
>
> Le 23/08/2022 à 10:47, Christophe Leroy a écrit :
> >
> >
> > Le 23/08/2022 à 10:33, Michael Ellerman a écrit :
> >> Zhouyi Zhou <zhouzhouyi@gmail.com> writes:
> >>
> >> My worry is that this will force irq_soft_mask_set() out of line, which
> >> we would rather avoid. It's meant to be a fast path.
> >>
> >> In fact with this applied I see nearly 300 out-of-line copies of the
> >> function when building a defconfig, and ~1700 calls to it.
> >>
> >> Normally it is inlined at every call site.
> >>
> >>
> >> So I think I'm inclined to revert ef5b570d3700 ("powerpc/irq: Don't open
> >> code irq_soft_mask helpers").
> >
> > Could you revert it only partially ? In extenso, revert the
> > READ/WRITE_ONCE and bring back the inline asm in irq_soft_mask_return()
> >   and irq_soft_mask_set(), but keep other changes.
>
> I sent a patch doing that.
Thank Christophe for the fix. I am very glad to be of benefit to the
community ;-)
Also thank Michael and Paul for your constant encouragement and
guidance, I learned to use objdump to count the number of failed
inline function calls today ;-)

By the way, from my experiments, both gcc-11 and clang-14 behave the
same as Michael has described.

Cheers
Zhouyi
>
> Christophe
diff mbox series

Patch

diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index 26ede09c521d..a5ae8d82cc9d 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -121,7 +121,7 @@  static inline notrace unsigned long irq_soft_mask_return(void)
  * for the critical section and as a clobber because
  * we changed paca->irq_soft_mask
  */
-static inline notrace void irq_soft_mask_set(unsigned long mask)
+static inline notrace __no_kcsan __no_sanitize_address void irq_soft_mask_set(unsigned long mask)
 {
 	/*
 	 * The irq mask must always include the STD bit if any are set.