diff mbox series

parisc: Fix code/instruction patching on PA1.x machines

Message ID YX8HK7ZZZhjRQzcr@ls3530 (mailing list archive)
State New, archived
Headers show
Series parisc: Fix code/instruction patching on PA1.x machines | expand

Commit Message

Helge Deller Oct. 31, 2021, 9:14 p.m. UTC
On PA1.x machines it's not sufficient to just flush the data and
instruction caches when we have written new instruction codes into the
parallel mapped memory segment, but we really need to invalidate (purge)
the cache too. Otherwise the processor will still execute the old
instructions which are still in the data/instruction cache.

Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: 4e87ace902cf ("parisc: add support for patching multiple words")
Cc: stable@vger.kernel.org # v5.3+

Comments

John David Anglin Nov. 3, 2021, 8:12 p.m. UTC | #1
Hi Helge,

I think the real problem is that neither flush_kernel_vmap_range() or invalidate_kernel_vmap_range()
flush the icache.  They only operate on the data cache. flush_icache_range will flush both caches.

Dave

On 2021-10-31 5:14 p.m., Helge Deller wrote:
> On PA1.x machines it's not sufficient to just flush the data and
> instruction caches when we have written new instruction codes into the
> parallel mapped memory segment, but we really need to invalidate (purge)
> the cache too. Otherwise the processor will still execute the old
> instructions which are still in the data/instruction cache.
>
> Signed-off-by: Helge Deller <deller@gmx.de>
> Fixes: 4e87ace902cf ("parisc: add support for patching multiple words")
> Cc: stable@vger.kernel.org # v5.3+
>
> diff --git a/arch/parisc/kernel/patch.c b/arch/parisc/kernel/patch.c
> index 80a0ab372802..8cbb7e1d5a2b 100644
> --- a/arch/parisc/kernel/patch.c
> +++ b/arch/parisc/kernel/patch.c
> @@ -81,7 +81,7 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len)
>   			 * We're crossing a page boundary, so
>   			 * need to remap
>   			 */
> -			flush_kernel_vmap_range((void *)fixmap,
> +			invalidate_kernel_vmap_range((void *)fixmap,
>   						(p-fixmap) * sizeof(*p));
>   			if (mapped)
>   				patch_unmap(FIX_TEXT_POKE0, &flags);
> @@ -90,9 +90,10 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len)
>   		}
>   	}
>
> -	flush_kernel_vmap_range((void *)fixmap, (p-fixmap) * sizeof(*p));
> +	invalidate_kernel_vmap_range((void *)fixmap, (p-fixmap) * sizeof(*p));
>   	if (mapped)
>   		patch_unmap(FIX_TEXT_POKE0, &flags);
> +	invalidate_kernel_vmap_range((void *)start, end - start);
>   	flush_icache_range(start, end);
>   }
>
Helge Deller Nov. 3, 2021, 9:08 p.m. UTC | #2
Hi Dave,

On 11/3/21 21:12, John David Anglin wrote:
> I think the real problem is that neither flush_kernel_vmap_range() or
> invalidate_kernel_vmap_range() flush the icache.  They only operate
> on the data cache. flush_icache_range will flush both caches.
Yes.
But we write the new instructions to a congruently memory are (same
physical memory like the kernel code), then flush/invalidate the
D-Cache, and finally flush the I-cache of kernel code memory.
See last function call of __patch_text_multiple().

So, logically I think it should work (and it does on PA2.x).

Or do you mean to flush the I-Cache of both mappings?

Helge


>
> Dave
>
> On 2021-10-31 5:14 p.m., Helge Deller wrote:
>> On PA1.x machines it's not sufficient to just flush the data and
>> instruction caches when we have written new instruction codes into the
>> parallel mapped memory segment, but we really need to invalidate (purge)
>> the cache too. Otherwise the processor will still execute the old
>> instructions which are still in the data/instruction cache.
>>
>> Signed-off-by: Helge Deller <deller@gmx.de>
>> Fixes: 4e87ace902cf ("parisc: add support for patching multiple words")
>> Cc: stable@vger.kernel.org # v5.3+
>>
>> diff --git a/arch/parisc/kernel/patch.c b/arch/parisc/kernel/patch.c
>> index 80a0ab372802..8cbb7e1d5a2b 100644
>> --- a/arch/parisc/kernel/patch.c
>> +++ b/arch/parisc/kernel/patch.c
>> @@ -81,7 +81,7 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len)
>>                * We're crossing a page boundary, so
>>                * need to remap
>>                */
>> -            flush_kernel_vmap_range((void *)fixmap,
>> +            invalidate_kernel_vmap_range((void *)fixmap,
>>                           (p-fixmap) * sizeof(*p));
>>               if (mapped)
>>                   patch_unmap(FIX_TEXT_POKE0, &flags);
>> @@ -90,9 +90,10 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len)
>>           }
>>       }
>>
>> -    flush_kernel_vmap_range((void *)fixmap, (p-fixmap) * sizeof(*p));
>> +    invalidate_kernel_vmap_range((void *)fixmap, (p-fixmap) * sizeof(*p));
>>       if (mapped)
>>           patch_unmap(FIX_TEXT_POKE0, &flags);
>> +    invalidate_kernel_vmap_range((void *)start, end - start);
>>       flush_icache_range(start, end);

^^  HERE


>>   }
>>
>
>
John David Anglin Nov. 4, 2021, 5:57 p.m. UTC | #3
On 2021-11-03 5:08 p.m., Helge Deller wrote:
> Hi Dave,
> 
> On 11/3/21 21:12, John David Anglin wrote:
>> I think the real problem is that neither flush_kernel_vmap_range() or
>> invalidate_kernel_vmap_range() flush the icache.  They only operate
>> on the data cache. flush_icache_range will flush both caches.
> Yes.
> But we write the new instructions to a congruently memory are (same
> physical memory like the kernel code), then flush/invalidate the
> D-Cache, and finally flush the I-cache of kernel code memory.
> See last function call of __patch_text_multiple().
> 
> So, logically I think it should work (and it does on PA2.x).

I still believe it is incorrect to use invalidate_kernel_vmap_range() to flush the writes in
__patch_text_multiple() to memory.  Both the PA 1.1 and 2.0 architecture documents state that
it is implementation dependent whether pdc writes back dirty lines to memory at priority 0.
If the writes are not flushed to memory, the patch won't install.

> 
> Or do you mean to flush the I-Cache of both mappings?

I reviewed the flush operations in __patch_text_multiple().  I believe the current code is more or
less correct, but not optimal.  I believe the final call to flush_icache_range() is unnecessary.  We
can also eliminate one range flush in the calls to make sure we don't have any aliases in the cache.
See change attached below.

The big problem with __patch_text_multiple() is can only patch code that the patch code doesn't depend
on.  This line in __patch_text_multiple() will cause a TLB fault on the first execution after a new
patch_map is setup. We do alternative patching to the TLB handler when ALT_COND_NO_SMP is true.  For
example,

         .macro          ptl_lock        spc,ptp,pte,tmp,tmp1,fault
#ifdef CONFIG_TLB_PTLOCK
98:     cmpib,COND(=),n 0,\spc,2f
         get_ptl         \tmp
1:      LDCW            0(\tmp),\tmp1
         cmpib,COND(=)   0,\tmp1,1b
         nop
         LDREG           0(\ptp),\pte
         bb,<,n          \pte,_PAGE_PRESENT_BIT,3f
         b               \fault
         stw             \spc,0(\tmp)
99:     ALTERNATIVE(98b, 99b, ALT_COND_NO_SMP, INSN_NOP)
#endif
2:      LDREG           0(\ptp),\pte
         bb,>=,n         \pte,_PAGE_PRESENT_BIT,\fault
3:
         .endm

If the region being patched spans a page boundary, we will get a TLB fault with partially patched TLB
code.  I suspect something like this is the real issue with the fixmap code.

Maybe this could be avoided by patching in a "b,n 99f" instruction at 98b.  That would avoid patching
in multiple nop instructions.

Dave

> 
> Helge
> 
> 
>>
>> Dave
>>
>> On 2021-10-31 5:14 p.m., Helge Deller wrote:
>>> On PA1.x machines it's not sufficient to just flush the data and
>>> instruction caches when we have written new instruction codes into the
>>> parallel mapped memory segment, but we really need to invalidate (purge)
>>> the cache too. Otherwise the processor will still execute the old
>>> instructions which are still in the data/instruction cache.
>>>
>>> Signed-off-by: Helge Deller <deller@gmx.de>
>>> Fixes: 4e87ace902cf ("parisc: add support for patching multiple words")
>>> Cc: stable@vger.kernel.org # v5.3+
>>>
>>> diff --git a/arch/parisc/kernel/patch.c b/arch/parisc/kernel/patch.c
>>> index 80a0ab372802..8cbb7e1d5a2b 100644
>>> --- a/arch/parisc/kernel/patch.c
>>> +++ b/arch/parisc/kernel/patch.c
>>> @@ -81,7 +81,7 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len)
>>>                 * We're crossing a page boundary, so
>>>                 * need to remap
>>>                 */
>>> -            flush_kernel_vmap_range((void *)fixmap,
>>> +            invalidate_kernel_vmap_range((void *)fixmap,
>>>                            (p-fixmap) * sizeof(*p));
>>>                if (mapped)
>>>                    patch_unmap(FIX_TEXT_POKE0, &flags);
>>> @@ -90,9 +90,10 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len)
>>>            }
>>>        }
>>>
>>> -    flush_kernel_vmap_range((void *)fixmap, (p-fixmap) * sizeof(*p));
>>> +    invalidate_kernel_vmap_range((void *)fixmap, (p-fixmap) * sizeof(*p));
>>>        if (mapped)
>>>            patch_unmap(FIX_TEXT_POKE0, &flags);
>>> +    invalidate_kernel_vmap_range((void *)start, end - start);
>>>        flush_icache_range(start, end);
> 

---
diff --git a/arch/parisc/kernel/patch.c b/arch/parisc/kernel/patch.c
index 80a0ab372802..799795bc4210 100644
--- a/arch/parisc/kernel/patch.c
+++ b/arch/parisc/kernel/patch.c
@@ -67,8 +67,8 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len)
  	int mapped;

  	/* Make sure we don't have any aliases in cache */
-	flush_kernel_vmap_range(addr, len);
  	flush_icache_range(start, end);
+	flush_tlb_kernel_range(start, end);

  	p = fixmap = patch_map(addr, FIX_TEXT_POKE0, &flags, &mapped);

@@ -93,7 +93,6 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len)
  	flush_kernel_vmap_range((void *)fixmap, (p-fixmap) * sizeof(*p));
  	if (mapped)
  		patch_unmap(FIX_TEXT_POKE0, &flags);
-	flush_icache_range(start, end);
  }

  void __kprobes __patch_text(void *addr, u32 insn)
John David Anglin Nov. 4, 2021, 6:06 p.m. UTC | #4
On 2021-11-04 1:57 p.m., John David Anglin wrote:
> On 2021-11-03 5:08 p.m., Helge Deller wrote:
>> Hi Dave,
>>
>> On 11/3/21 21:12, John David Anglin wrote:
>>> I think the real problem is that neither flush_kernel_vmap_range() or
>>> invalidate_kernel_vmap_range() flush the icache.  They only operate
>>> on the data cache. flush_icache_range will flush both caches.
>> Yes.
>> But we write the new instructions to a congruently memory are (same
>> physical memory like the kernel code), then flush/invalidate the
>> D-Cache, and finally flush the I-cache of kernel code memory.
>> See last function call of __patch_text_multiple().
>>
>> So, logically I think it should work (and it does on PA2.x).
>
> I still believe it is incorrect to use invalidate_kernel_vmap_range() to flush the writes in
> __patch_text_multiple() to memory.  Both the PA 1.1 and 2.0 architecture documents state that
> it is implementation dependent whether pdc writes back dirty lines to memory at priority 0.
> If the writes are not flushed to memory, the patch won't install.
>
>>
>> Or do you mean to flush the I-Cache of both mappings?
>
> I reviewed the flush operations in __patch_text_multiple().  I believe the current code is more or
> less correct, but not optimal.  I believe the final call to flush_icache_range() is unnecessary.  We
> can also eliminate one range flush in the calls to make sure we don't have any aliases in the cache.
> See change attached below.
>
> The big problem with __patch_text_multiple() is can only patch code that the patch code doesn't depend
> on.  This line in __patch_text_multiple() will cause a TLB fault on the first execution after a new
         *p++ = *insn++;

> patch_map is setup. We do alternative patching to the TLB handler when ALT_COND_NO_SMP is true.  For
> example,
>
>         .macro          ptl_lock        spc,ptp,pte,tmp,tmp1,fault
> #ifdef CONFIG_TLB_PTLOCK
> 98:     cmpib,COND(=),n 0,\spc,2f
>         get_ptl         \tmp
> 1:      LDCW            0(\tmp),\tmp1
>         cmpib,COND(=)   0,\tmp1,1b
>         nop
>         LDREG           0(\ptp),\pte
>         bb,<,n          \pte,_PAGE_PRESENT_BIT,3f
>         b               \fault
>         stw             \spc,0(\tmp)
> 99:     ALTERNATIVE(98b, 99b, ALT_COND_NO_SMP, INSN_NOP)
> #endif
> 2:      LDREG           0(\ptp),\pte
>         bb,>=,n         \pte,_PAGE_PRESENT_BIT,\fault
> 3:
>         .endm
>
> If the region being patched spans a page boundary, we will get a TLB fault with partially patched TLB
> code.  I suspect something like this is the real issue with the fixmap code.
>
> Maybe this could be avoided by patching in a "b,n 99f" instruction at 98b.  That would avoid patching
> in multiple nop instructions.
>
> Dave
>
>>
>> Helge
>>
>>
>>>
>>> Dave
>>>
>>> On 2021-10-31 5:14 p.m., Helge Deller wrote:
>>>> On PA1.x machines it's not sufficient to just flush the data and
>>>> instruction caches when we have written new instruction codes into the
>>>> parallel mapped memory segment, but we really need to invalidate (purge)
>>>> the cache too. Otherwise the processor will still execute the old
>>>> instructions which are still in the data/instruction cache.
>>>>
>>>> Signed-off-by: Helge Deller <deller@gmx.de>
>>>> Fixes: 4e87ace902cf ("parisc: add support for patching multiple words")
>>>> Cc: stable@vger.kernel.org # v5.3+
>>>>
>>>> diff --git a/arch/parisc/kernel/patch.c b/arch/parisc/kernel/patch.c
>>>> index 80a0ab372802..8cbb7e1d5a2b 100644
>>>> --- a/arch/parisc/kernel/patch.c
>>>> +++ b/arch/parisc/kernel/patch.c
>>>> @@ -81,7 +81,7 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len)
>>>>                 * We're crossing a page boundary, so
>>>>                 * need to remap
>>>>                 */
>>>> -            flush_kernel_vmap_range((void *)fixmap,
>>>> +            invalidate_kernel_vmap_range((void *)fixmap,
>>>>                            (p-fixmap) * sizeof(*p));
>>>>                if (mapped)
>>>>                    patch_unmap(FIX_TEXT_POKE0, &flags);
>>>> @@ -90,9 +90,10 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len)
>>>>            }
>>>>        }
>>>>
>>>> -    flush_kernel_vmap_range((void *)fixmap, (p-fixmap) * sizeof(*p));
>>>> +    invalidate_kernel_vmap_range((void *)fixmap, (p-fixmap) * sizeof(*p));
>>>>        if (mapped)
>>>>            patch_unmap(FIX_TEXT_POKE0, &flags);
>>>> +    invalidate_kernel_vmap_range((void *)start, end - start);
>>>>        flush_icache_range(start, end);
>>
>
> ---
> diff --git a/arch/parisc/kernel/patch.c b/arch/parisc/kernel/patch.c
> index 80a0ab372802..799795bc4210 100644
> --- a/arch/parisc/kernel/patch.c
> +++ b/arch/parisc/kernel/patch.c
> @@ -67,8 +67,8 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len)
>      int mapped;
>
>      /* Make sure we don't have any aliases in cache */
> -    flush_kernel_vmap_range(addr, len);
>      flush_icache_range(start, end);
> +    flush_tlb_kernel_range(start, end);
>
>      p = fixmap = patch_map(addr, FIX_TEXT_POKE0, &flags, &mapped);
>
> @@ -93,7 +93,6 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len)
>      flush_kernel_vmap_range((void *)fixmap, (p-fixmap) * sizeof(*p));
>      if (mapped)
>          patch_unmap(FIX_TEXT_POKE0, &flags);
> -    flush_icache_range(start, end);
>  }
>
>  void __kprobes __patch_text(void *addr, u32 insn)
Helge Deller Nov. 4, 2021, 8:48 p.m. UTC | #5
Hi Dave,

On 11/4/21 18:57, John David Anglin wrote:
> On 2021-11-03 5:08 p.m., Helge Deller wrote:
>> On 11/3/21 21:12, John David Anglin wrote:
>>> I think the real problem is that neither flush_kernel_vmap_range() or
>>> invalidate_kernel_vmap_range() flush the icache.  They only operate
>>> on the data cache. flush_icache_range will flush both caches.
>> Yes.
>> But we write the new instructions to a congruently memory are (same
>> physical memory like the kernel code), then flush/invalidate the
>> D-Cache, and finally flush the I-cache of kernel code memory.
>> See last function call of __patch_text_multiple().
>>
>> So, logically I think it should work (and it does on PA2.x).
>
> I still believe it is incorrect to use invalidate_kernel_vmap_range() to flush the writes in
> __patch_text_multiple() to memory.  Both the PA 1.1 and 2.0 architecture documents state that
> it is implementation dependent whether pdc writes back dirty lines to memory at priority 0.
> If the writes are not flushed to memory, the patch won't install.
>
>> Or do you mean to flush the I-Cache of both mappings?
>
> I reviewed the flush operations in __patch_text_multiple().  I believe the current code is more or
> less correct, but not optimal.  I believe the final call to flush_icache_range() is unnecessary.  We
> can also eliminate one range flush in the calls to make sure we don't have any aliases in the cache.
> See change attached below.
>
> The big problem with __patch_text_multiple() is can only patch code that the patch code doesn't depend
> on.  This line in __patch_text_multiple() will cause a TLB fault on the first execution after a new
> patch_map is setup. We do alternative patching to the TLB handler when ALT_COND_NO_SMP is true.

I believe the alternative code patching isn't critical.
It just patches in NOPs, so even if the newly patched NOPs aren't visible yet (when the TLB handler is
executed), it uses the old instructions which shouldn't harm anything. They were executed before the
whole time.

I do agree, that patching other code paths (with non-NOPs) could be critial.

By the way, to narrow down the problem, I do boot those tests here with the "no-alternatives"
kernel parameter, which effectively disables the alternatives-patching.

> For example,
>
>         .macro          ptl_lock        spc,ptp,pte,tmp,tmp1,fault
> #ifdef CONFIG_TLB_PTLOCK
> 98:     cmpib,COND(=),n 0,\spc,2f
>         get_ptl         \tmp
> 1:      LDCW            0(\tmp),\tmp1
>         cmpib,COND(=)   0,\tmp1,1b
>         nop
>         LDREG           0(\ptp),\pte
>         bb,<,n          \pte,_PAGE_PRESENT_BIT,3f
>         b               \fault
>         stw             \spc,0(\tmp)
> 99:     ALTERNATIVE(98b, 99b, ALT_COND_NO_SMP, INSN_NOP)
> #endif
> 2:      LDREG           0(\ptp),\pte
>         bb,>=,n         \pte,_PAGE_PRESENT_BIT,\fault
> 3:
>         .endm
>
> If the region being patched spans a page boundary, we will get a TLB fault with partially patched TLB
> code.  I suspect something like this is the real issue with the fixmap code.
>
> Maybe this could be avoided by patching in a "b,n 99f" instruction at 98b.  That would avoid patching
> in multiple nop instructions.

It's already being done that way. See alternative.c:
                /*
                 * Replace instruction with NOPs?
                 * For long distance insert a branch instruction instead.
                 */
                if (replacement == INSN_NOP && len > 1)
                        replacement = 0xe8000002 + (len-2)*8; /* "b,n .+8" */

>>> On 2021-10-31 5:14 p.m., Helge Deller wrote:
>>>> On PA1.x machines it's not sufficient to just flush the data and
>>>> instruction caches when we have written new instruction codes into the
>>>> parallel mapped memory segment, but we really need to invalidate (purge)
>>>> the cache too. Otherwise the processor will still execute the old
>>>> instructions which are still in the data/instruction cache.
>>>>
>>>> Signed-off-by: Helge Deller <deller@gmx.de>
>>>> Fixes: 4e87ace902cf ("parisc: add support for patching multiple words")
>>>> Cc: stable@vger.kernel.org # v5.3+
>>>>
>>>> diff --git a/arch/parisc/kernel/patch.c b/arch/parisc/kernel/patch.c
>>>> index 80a0ab372802..8cbb7e1d5a2b 100644
>>>> --- a/arch/parisc/kernel/patch.c
>>>> +++ b/arch/parisc/kernel/patch.c
>>>> @@ -81,7 +81,7 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len)
>>>>                 * We're crossing a page boundary, so
>>>>                 * need to remap
>>>>                 */
>>>> -            flush_kernel_vmap_range((void *)fixmap,
>>>> +            invalidate_kernel_vmap_range((void *)fixmap,
>>>>                            (p-fixmap) * sizeof(*p));
>>>>                if (mapped)
>>>>                    patch_unmap(FIX_TEXT_POKE0, &flags);
>>>> @@ -90,9 +90,10 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len)
>>>>            }
>>>>        }
>>>>
>>>> -    flush_kernel_vmap_range((void *)fixmap, (p-fixmap) * sizeof(*p));
>>>> +    invalidate_kernel_vmap_range((void *)fixmap, (p-fixmap) * sizeof(*p));
>>>>        if (mapped)
>>>>            patch_unmap(FIX_TEXT_POKE0, &flags);
>>>> +    invalidate_kernel_vmap_range((void *)start, end - start);
>>>>        flush_icache_range(start, end);
>>
>
> ---
> diff --git a/arch/parisc/kernel/patch.c b/arch/parisc/kernel/patch.c
> index 80a0ab372802..799795bc4210 100644
> --- a/arch/parisc/kernel/patch.c
> +++ b/arch/parisc/kernel/patch.c
> @@ -67,8 +67,8 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len)
>      int mapped;
>
>      /* Make sure we don't have any aliases in cache */
> -    flush_kernel_vmap_range(addr, len);
>      flush_icache_range(start, end);
> +    flush_tlb_kernel_range(start, end);
>
>      p = fixmap = patch_map(addr, FIX_TEXT_POKE0, &flags, &mapped);
>
> @@ -93,7 +93,6 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len)
>      flush_kernel_vmap_range((void *)fixmap, (p-fixmap) * sizeof(*p));
>      if (mapped)
>          patch_unmap(FIX_TEXT_POKE0, &flags);
> -    flush_icache_range(start, end);
>  }
>
>  void __kprobes __patch_text(void *addr, u32 insn)

Here is the syslog with your patch on the 715/64:
...
Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
mem auto-init: stack:off, heap alloc:off, heap free:off
Memory: 148832K/163840K available (8452K kernel code, 1461K rwdata, 2379K rodata, 696K init, 444K bss, 15008K reserved, 0K cma-reserved)
ftrace: allocating 26912 entries in 53 pages
Backtrace:
 [<1097d588>] __patch_text+0x20/0x30
 [<101c5128>] ftrace_make_nop+0x3c/0x68
 [<102c98a8>] ftrace_process_locs.isra.0+0x208/0x2b0
 [<10113ba0>] ftrace_init+0xa8/0x154
 [<10101284>] start_kernel+0x3d0/0x708
 [<10105244>] start_parisc+0xb8/0xec

Bad Address (null pointer deref?): Code=15 (Data TLB miss fault) at addr 0effe310
CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0-32bit+ #1017
Hardware name: 9000/715

     YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00000000000011000000001100001110 Not tainted
r00-03  000c030e 10c7cda8 1097d474 10c67340
r04-07  10100314 00000000 10c672dc 10c7d328
r08-11  00000000 00000000 10cbe5a8 00000000
r12-15  10c75da8 10b2f800 10d7d5a8 00000001
r16-19  00000000 00000002 00000000 08000240
r20-23  10c67000 00000200 10100500 00000020
r24-27  0efff000 0efff000 0effe310 10b915a8
r28-31  0effe310 00000000 10c673c0 10279a68
sr00-03  00000000 00000000 00000000 00000000
sr04-07  00000000 00000000 00000000 00000000

IASQ: 00000000 00000000 IAOQ: 1097d4e4 1097d480
 IIR: 0f9312a8    ISR: 00000000  IOR: 0effe310
 CPU:        0   CR30: 10c67000 CR31: f00effff
 ORIG_R28: 00000000
 IAOQ[0]: __patch_text_multiple+0xdc/0x12c
 IAOQ[1]: __patch_text_multiple+0x78/0x12c
 RP(r2): __patch_text_multiple+0x6c/0x12c
Backtrace:
 [<1097d588>] __patch_text+0x20/0x30
 [<101c5128>] ftrace_make_nop+0x3c/0x68
 [<102c98a8>] ftrace_process_locs.isra.0+0x208/0x2b0
 [<10113ba0>] ftrace_init+0xa8/0x154
 [<10101284>] start_kernel+0x3d0/0x708
 [<10105244>] start_parisc+0xb8/0xec
Kernel panic - not syncing: Bad Address (null pointer deref?)

Helge
John David Anglin Nov. 4, 2021, 9:16 p.m. UTC | #6
On 2021-11-04 4:48 p.m., Helge Deller wrote:
> Here is the syslog with your patch on the 715/64:
> ...
> Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
> mem auto-init: stack:off, heap alloc:off, heap free:off
> Memory: 148832K/163840K available (8452K kernel code, 1461K rwdata, 2379K rodata, 696K init, 444K bss, 15008K reserved, 0K cma-reserved)
> ftrace: allocating 26912 entries in 53 pages
> Backtrace:
>   [<1097d588>] __patch_text+0x20/0x30
>   [<101c5128>] ftrace_make_nop+0x3c/0x68
>   [<102c98a8>] ftrace_process_locs.isra.0+0x208/0x2b0
>   [<10113ba0>] ftrace_init+0xa8/0x154
>   [<10101284>] start_kernel+0x3d0/0x708
>   [<10105244>] start_parisc+0xb8/0xec
>
> Bad Address (null pointer deref?): Code=15 (Data TLB miss fault) at addr 0effe310
> CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0-32bit+ #1017
> Hardware name: 9000/715
>
>       YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> PSW: 00000000000011000000001100001110 Not tainted
> r00-03  000c030e 10c7cda8 1097d474 10c67340
> r04-07  10100314 00000000 10c672dc 10c7d328
> r08-11  00000000 00000000 10cbe5a8 00000000
> r12-15  10c75da8 10b2f800 10d7d5a8 00000001
> r16-19  00000000 00000002 00000000 08000240
> r20-23  10c67000 00000200 10100500 00000020
> r24-27  0efff000 0efff000 0effe310 10b915a8
> r28-31  0effe310 00000000 10c673c0 10279a68
> sr00-03  00000000 00000000 00000000 00000000
> sr04-07  00000000 00000000 00000000 00000000
>
> IASQ: 00000000 00000000 IAOQ: 1097d4e4 1097d480
>   IIR: 0f9312a8    ISR: 00000000  IOR: 0effe310
>   CPU:        0   CR30: 10c67000 CR31: f00effff
>   ORIG_R28: 00000000
>   IAOQ[0]: __patch_text_multiple+0xdc/0x12c
>   IAOQ[1]: __patch_text_multiple+0x78/0x12c
>   RP(r2): __patch_text_multiple+0x6c/0x12c
> Backtrace:
>   [<1097d588>] __patch_text+0x20/0x30
>   [<101c5128>] ftrace_make_nop+0x3c/0x68
>   [<102c98a8>] ftrace_process_locs.isra.0+0x208/0x2b0
>   [<10113ba0>] ftrace_init+0xa8/0x154
>   [<10101284>] start_kernel+0x3d0/0x708
>   [<10105244>] start_parisc+0xb8/0xec
> Kernel panic - not syncing: Bad Address (null pointer deref?)
The faulting instruction is "stw,ma r19,4(ret0)".  r19 contains nop instruction.  r28
contains the address being patched.  Where does it (0x0effe310) point in kernel?

It seems likely to me that TLB handler is broken and that's the reason for the bad address fault.
But maybe there's a map problem.

Dave
John David Anglin Nov. 4, 2021, 9:19 p.m. UTC | #7
On 2021-11-04 4:48 p.m., Helge Deller wrote:
> I believe the alternative code patching isn't critical.
> It just patches in NOPs, so even if the newly patched NOPs aren't visible yet (when the TLB handler is
> executed), it uses the old instructions which shouldn't harm anything. They were executed before the
> whole time.
But we flush the data cache when we hit a page boundary.  This makes these NOPs visible.

Maybe we should do both flushes after all words have been written.

Dave
Helge Deller Nov. 4, 2021, 9:27 p.m. UTC | #8
Hi Dave,

On 11/4/21 22:16, John David Anglin wrote:
> On 2021-11-04 4:48 p.m., Helge Deller wrote:
>> Here is the syslog with your patch on the 715/64:
>> ...
>> Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
>> mem auto-init: stack:off, heap alloc:off, heap free:off
>> Memory: 148832K/163840K available (8452K kernel code, 1461K rwdata, 2379K rodata, 696K init, 444K bss, 15008K reserved, 0K cma-reserved)
>> ftrace: allocating 26912 entries in 53 pages
>> Backtrace:
>>   [<1097d588>] __patch_text+0x20/0x30
>>   [<101c5128>] ftrace_make_nop+0x3c/0x68
>>   [<102c98a8>] ftrace_process_locs.isra.0+0x208/0x2b0
>>   [<10113ba0>] ftrace_init+0xa8/0x154
>>   [<10101284>] start_kernel+0x3d0/0x708
>>   [<10105244>] start_parisc+0xb8/0xec
>>
>> Bad Address (null pointer deref?): Code=15 (Data TLB miss fault) at addr 0effe310
>> CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0-32bit+ #1017
>> Hardware name: 9000/715
>>
>>       YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
>> PSW: 00000000000011000000001100001110 Not tainted
>> r00-03  000c030e 10c7cda8 1097d474 10c67340
>> r04-07  10100314 00000000 10c672dc 10c7d328
>> r08-11  00000000 00000000 10cbe5a8 00000000
>> r12-15  10c75da8 10b2f800 10d7d5a8 00000001
>> r16-19  00000000 00000002 00000000 08000240
>> r20-23  10c67000 00000200 10100500 00000020
>> r24-27  0efff000 0efff000 0effe310 10b915a8
>> r28-31  0effe310 00000000 10c673c0 10279a68
>> sr00-03  00000000 00000000 00000000 00000000
>> sr04-07  00000000 00000000 00000000 00000000
>>
>> IASQ: 00000000 00000000 IAOQ: 1097d4e4 1097d480
>>   IIR: 0f9312a8    ISR: 00000000  IOR: 0effe310
>>   CPU:        0   CR30: 10c67000 CR31: f00effff
>>   ORIG_R28: 00000000
>>   IAOQ[0]: __patch_text_multiple+0xdc/0x12c
>>   IAOQ[1]: __patch_text_multiple+0x78/0x12c
>>   RP(r2): __patch_text_multiple+0x6c/0x12c
>> Backtrace:
>>   [<1097d588>] __patch_text+0x20/0x30
>>   [<101c5128>] ftrace_make_nop+0x3c/0x68
>>   [<102c98a8>] ftrace_process_locs.isra.0+0x208/0x2b0
>>   [<10113ba0>] ftrace_init+0xa8/0x154
>>   [<10101284>] start_kernel+0x3d0/0x708
>>   [<10105244>] start_parisc+0xb8/0xec
>> Kernel panic - not syncing: Bad Address (null pointer deref?)
> The faulting instruction is "stw,ma r19,4(ret0)".  r19 contains nop instruction.  r28
> contains the address being patched.  Where does it (0x0effe310) point in kernel?
>
> It seems likely to me that TLB handler is broken and that's the reason for the bad address fault.
> But maybe there's a map problem.

Right, there is a map problem!
Please ignore this last backtrace I sent.
I missed to include my previous patch to fix the TLB issue, aka this patch:
 https://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux.git/commit/?h=parisc-5.16-1&id=6e866a462867b60841202e900f10936a0478608c
So, this backtrace is bogus...

With my patch (and yours) applied now it comes further.
Maybe there is generic ftrace/kprobe problem in kernel v5.15 into which I run?
I'll continue testing on git head from Linus and report back soon...

Helge
John David Anglin Nov. 4, 2021, 9:35 p.m. UTC | #9
Hi Helge,

On 2021-11-04 5:27 p.m., Helge Deller wrote:
> Right, there is a map problem!
> Please ignore this last backtrace I sent.
> I missed to include my previous patch to fix the TLB issue, aka this patch:
>   https://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux.git/commit/?h=parisc-5.16-1&id=6e866a462867b60841202e900f10936a0478608c
I have this change installed in my tree.  rp3440 boots okay with my patch but I don't have CONFIG_KPROBES set.

Dave
Sven Schnelle Nov. 6, 2021, 8:42 p.m. UTC | #10
Helge Deller <deller@gmx.de> writes:

> Hi Dave,
>
> On 11/4/21 18:57, John David Anglin wrote:
>> On 2021-11-03 5:08 p.m., Helge Deller wrote:
>>> On 11/3/21 21:12, John David Anglin wrote:
>>>> I think the real problem is that neither flush_kernel_vmap_range() or
>>>> invalidate_kernel_vmap_range() flush the icache.  They only operate
>>>> on the data cache. flush_icache_range will flush both caches.
>>> Yes.
>>> But we write the new instructions to a congruently memory are (same
>>> physical memory like the kernel code), then flush/invalidate the
>>> D-Cache, and finally flush the I-cache of kernel code memory.
>>> See last function call of __patch_text_multiple().
>>>
>>> So, logically I think it should work (and it does on PA2.x).
>>
>> I still believe it is incorrect to use invalidate_kernel_vmap_range() to flush the writes in
>> __patch_text_multiple() to memory.  Both the PA 1.1 and 2.0 architecture documents state that
>> it is implementation dependent whether pdc writes back dirty lines to memory at priority 0.
>> If the writes are not flushed to memory, the patch won't install.
>>
>>> Or do you mean to flush the I-Cache of both mappings?
>>
>> I reviewed the flush operations in __patch_text_multiple().  I believe the current code is more or
>> less correct, but not optimal.  I believe the final call to flush_icache_range() is unnecessary.  We
>> can also eliminate one range flush in the calls to make sure we don't have any aliases in the cache.
>> See change attached below.
>>
>> The big problem with __patch_text_multiple() is can only patch code that the patch code doesn't depend
>> on.  This line in __patch_text_multiple() will cause a TLB fault on the first execution after a new
>> patch_map is setup. We do alternative patching to the TLB handler when ALT_COND_NO_SMP is true.
>
> I believe the alternative code patching isn't critical.
> It just patches in NOPs, so even if the newly patched NOPs aren't visible yet (when the TLB handler is
> executed), it uses the old instructions which shouldn't harm anything. They were executed before the
> whole time.
>
> I do agree, that patching other code paths (with non-NOPs) could be critial.
>
> By the way, to narrow down the problem, I do boot those tests here with the "no-alternatives"
> kernel parameter, which effectively disables the alternatives-patching.
>

I think there are two paths in code patching, that are mixed up in the
text above:

a) __patch_text_multiple(), which is only used by ftrace (and kprobes on
   ftrace). This patches the functions as follows:

   - it patches the trampoline located before the function start
   - after doing so, it patches the branch which sits right at the
     beginning of the function. When removing, it first removes the branch
     and the trampoline code afterwards. This is done via two calls to
     __patch_text_multiple(), which isn't ideal. See
     ftrace_make_nop()/ftrace_make_call()

b) The alternative patching, which replaces code with memcpy(), which
   might suffer from the tlb problem mentioned above.

Sven
diff mbox series

Patch

diff --git a/arch/parisc/kernel/patch.c b/arch/parisc/kernel/patch.c
index 80a0ab372802..8cbb7e1d5a2b 100644
--- a/arch/parisc/kernel/patch.c
+++ b/arch/parisc/kernel/patch.c
@@ -81,7 +81,7 @@  void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len)
 			 * We're crossing a page boundary, so
 			 * need to remap
 			 */
-			flush_kernel_vmap_range((void *)fixmap,
+			invalidate_kernel_vmap_range((void *)fixmap,
 						(p-fixmap) * sizeof(*p));
 			if (mapped)
 				patch_unmap(FIX_TEXT_POKE0, &flags);
@@ -90,9 +90,10 @@  void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len)
 		}
 	}

-	flush_kernel_vmap_range((void *)fixmap, (p-fixmap) * sizeof(*p));
+	invalidate_kernel_vmap_range((void *)fixmap, (p-fixmap) * sizeof(*p));
 	if (mapped)
 		patch_unmap(FIX_TEXT_POKE0, &flags);
+	invalidate_kernel_vmap_range((void *)start, end - start);
 	flush_icache_range(start, end);
 }