[2/2] ARM: mm: make text and rodata read-only

Message ID	20140404195818.GA21028@debian (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> Date: Fri, 4 Apr 2014 21:58:18 +0200 From: Rabin Vincent <rabin@rab.in> To: Kees Cook <keescook@chromium.org> Subject: Re: [PATCH 2/2] ARM: mm: make text and rodata read-only Message-ID: <20140404195818.GA21028@debian> References: <1396577719-14786-1-git-send-email-keescook@chromium.org> <1396577719-14786-3-git-send-email-keescook@chromium.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1396577719-14786-3-git-send-email-keescook@chromium.org> User-Agent: Mutt/1.5.23 (2014-03-12) Cc: Russell King <linux@arm.linux.org.uk>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will.deacon@arm.com>, linux-kernel@vger.kernel.org, Laura Abbott <lauraa@codeaurora.org>, Alexander Holler <holler@ahsoftware.de>, linux-arm-kernel@lists.infradead.org Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Rabin Vincent April 4, 2014, 7:58 p.m. UTC

On Thu, Apr 03, 2014 at 07:15:19PM -0700, Kees Cook wrote:
> diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c
> index 34e56647dcee..4ae343c1e2a3 100644
> --- a/arch/arm/kernel/ftrace.c
> +++ b/arch/arm/kernel/ftrace.c
> @@ -14,6 +14,7 @@
>  
>  #include <linux/ftrace.h>
>  #include <linux/uaccess.h>
> +#include <linux/stop_machine.h>
>  
>  #include <asm/cacheflush.h>
>  #include <asm/opcodes.h>
> @@ -34,6 +35,22 @@
>  
>  #define	OLD_NOP		0xe1a00000	/* mov r0, r0 */
>  
> +static int __ftrace_modify_code(void *data)

This is in the CONFIG_OLD_MCOUNT ifdef, but should be in the outer ifdef
(CONFIG_DYNAMIC_FTRACE) instead, otherwise it will not get enabled for
for example Thumb-2 kernels.  This was wrong in my example patch too.

> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index 8539eb2a01ad..3baac4ad165f 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -681,30 +716,52 @@ static inline bool arch_has_strict_perms(void)
>  	return true;
>  }
>  
> +#define set_section_perms(perms, field)	{				\
> +	size_t i;							\
> +	unsigned long addr;						\
> +									\
> +	if (!arch_has_strict_perms())					\
> +		return;							\
> +									\
> +	for (i = 0; i < ARRAY_SIZE(perms); i++) {			\
> +		if (!IS_ALIGNED(perms[i].start, SECTION_SIZE) ||	\
> +		    !IS_ALIGNED(perms[i].end, SECTION_SIZE)) {		\
> +			pr_err("BUG: section %lx-%lx not aligned to %lx\n", \
> +				perms[i].start, perms[i].end,		\
> +				SECTION_SIZE);				\
> +			continue;					\
> +		}							\
> +									\
> +		for (addr = perms[i].start;				\
> +		     addr < perms[i].end;				\
> +		     addr += SECTION_SIZE)				\
> +			section_update(addr, perms[i].mask,		\
> +				       perms[i].field);			\
> +	}								\
> +}
> +
>  static inline void fix_kernmem_perms(void)
>  {
> -	unsigned long addr;
> -	unsigned int i;
> +	set_section_perms(nx_perms, prot);
> +}
>  
> -	if (!arch_has_strict_perms())
> -		return;
> +#ifdef CONFIG_DEBUG_RODATA
> +void mark_rodata_ro(void)
> +{
> +	set_section_perms(ro_perms, prot);
> +}
>  
> -	for (i = 0; i < ARRAY_SIZE(section_perms); i++) {
> -		if (!IS_ALIGNED(section_perms[i].start, SECTION_SIZE) ||
> -		    !IS_ALIGNED(section_perms[i].end, SECTION_SIZE)) {
> -			pr_err("BUG: section %lx-%lx not aligned to %lx\n",
> -				section_perms[i].start, section_perms[i].end,
> -				SECTION_SIZE);
> -			continue;
> -		}
> +void set_kernel_text_rw(void)
> +{
> +	set_section_perms(ro_perms, clear);
> +}

You need a TLB flush.  I had a flush_tlb_all() in my example patch,
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/244335.html,
but the following is probably nicer (on top of this patch):

Kees Cook April 5, 2014, 12:07 a.m. UTC | #1

On Fri, Apr 4, 2014 at 12:58 PM, Rabin Vincent <rabin@rab.in> wrote:
> On Thu, Apr 03, 2014 at 07:15:19PM -0700, Kees Cook wrote:
>> diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c
>> index 34e56647dcee..4ae343c1e2a3 100644
>> --- a/arch/arm/kernel/ftrace.c
>> +++ b/arch/arm/kernel/ftrace.c
>> @@ -14,6 +14,7 @@
>>
>>  #include <linux/ftrace.h>
>>  #include <linux/uaccess.h>
>> +#include <linux/stop_machine.h>
>>
>>  #include <asm/cacheflush.h>
>>  #include <asm/opcodes.h>
>> @@ -34,6 +35,22 @@
>>
>>  #define      OLD_NOP         0xe1a00000      /* mov r0, r0 */
>>
>> +static int __ftrace_modify_code(void *data)
>
> This is in the CONFIG_OLD_MCOUNT ifdef, but should be in the outer ifdef
> (CONFIG_DYNAMIC_FTRACE) instead, otherwise it will not get enabled for
> for example Thumb-2 kernels.  This was wrong in my example patch too.

Ah! Yes, good point. I've moved this now.

>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
>> index 8539eb2a01ad..3baac4ad165f 100644
>> --- a/arch/arm/mm/init.c
>> +++ b/arch/arm/mm/init.c
>> @@ -681,30 +716,52 @@ static inline bool arch_has_strict_perms(void)
>>       return true;
>>  }
>>
>> +#define set_section_perms(perms, field)      {                               \
>> +     size_t i;                                                       \
>> +     unsigned long addr;                                             \
>> +                                                                     \
>> +     if (!arch_has_strict_perms())                                   \
>> +             return;                                                 \
>> +                                                                     \
>> +     for (i = 0; i < ARRAY_SIZE(perms); i++) {                       \
>> +             if (!IS_ALIGNED(perms[i].start, SECTION_SIZE) ||        \
>> +                 !IS_ALIGNED(perms[i].end, SECTION_SIZE)) {          \
>> +                     pr_err("BUG: section %lx-%lx not aligned to %lx\n", \
>> +                             perms[i].start, perms[i].end,           \
>> +                             SECTION_SIZE);                          \
>> +                     continue;                                       \
>> +             }                                                       \
>> +                                                                     \
>> +             for (addr = perms[i].start;                             \
>> +                  addr < perms[i].end;                               \
>> +                  addr += SECTION_SIZE)                              \
>> +                     section_update(addr, perms[i].mask,             \
>> +                                    perms[i].field);                 \
>> +     }                                                               \
>> +}
>> +
>>  static inline void fix_kernmem_perms(void)
>>  {
>> -     unsigned long addr;
>> -     unsigned int i;
>> +     set_section_perms(nx_perms, prot);
>> +}
>>
>> -     if (!arch_has_strict_perms())
>> -             return;
>> +#ifdef CONFIG_DEBUG_RODATA
>> +void mark_rodata_ro(void)
>> +{
>> +     set_section_perms(ro_perms, prot);
>> +}
>>
>> -     for (i = 0; i < ARRAY_SIZE(section_perms); i++) {
>> -             if (!IS_ALIGNED(section_perms[i].start, SECTION_SIZE) ||
>> -                 !IS_ALIGNED(section_perms[i].end, SECTION_SIZE)) {
>> -                     pr_err("BUG: section %lx-%lx not aligned to %lx\n",
>> -                             section_perms[i].start, section_perms[i].end,
>> -                             SECTION_SIZE);
>> -                     continue;
>> -             }
>> +void set_kernel_text_rw(void)
>> +{
>> +     set_section_perms(ro_perms, clear);
>> +}
>
> You need a TLB flush.  I had a flush_tlb_all() in my example patch,
> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/244335.html,
> but the following is probably nicer (on top of this patch):
>
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index 9bea524..a92c45a 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -741,6 +741,8 @@ static inline bool arch_has_strict_perms(void)
>                      addr += SECTION_SIZE)                              \
>                         section_update(addr, perms[i].mask,             \
>                                        perms[i].field);                 \
> +                                                                       \
> +               flush_tlb_kernel_range(perms[i].start, perms[i].end);   \
>         }                                                               \
>  }
>

When I do this, I hang the system, and get a WARN due to the tlb call
attempting to flush on all CPUs, I think:

[   34.246034] WARNING: at
/mnt/host/source/src/third_party/kernel-next/kernel/smp.c:466
smp_call_function_many+0xac/0x26c()
...
[   34.246617] Backtrace:
[   34.246697] [<c010d3b8>] (unwind_backtrace+0x0/0x118) from
[<c060b9d8>] (dump_stack+0x28/0x30)
[   34.246765] [<c060b9d8>] (dump_stack+0x28/0x30) from [<c0123044>]
(warn_slowpath_null+0x44/0x5c)
[   34.246824] [<c0123044>] (warn_slowpath_null+0x44/0x5c) from
[<c017426c>] (smp_call_function_many+0xac/0x26c)
[   34.246881] [<c017426c>] (smp_call_function_many+0xac/0x26c) from
[<c0174468>] (smp_call_function+0x3c/0x48)
[   34.246937] [<c0174468>] (smp_call_function+0x3c/0x48) from
[<c010c0fc>] (broadcast_tlb_a15_erratum+0x40/0x4c)
[   34.246994] [<c010c0fc>] (broadcast_tlb_a15_erratum+0x40/0x4c) from
[<c010c590>] (flush_tlb_kernel_range+0x74/0xa0)
[   34.247046] [<c010c590>] (flush_tlb_kernel_range+0x74/0xa0) from
[<c011403c>] (set_kernel_text_rw+0xd8/0xec)
[   34.247099] [<c011403c>] (set_kernel_text_rw+0xd8/0xec) from
[<c010c878>] (__ftrace_modify_code+0x14/0x28)
[   34.247156] [<c010c878>] (__ftrace_modify_code+0x14/0x28) from
[<c0184318>] (stop_machine_cpu_stop+0xc0/0x114)
[   34.247212] [<c0184318>] (stop_machine_cpu_stop+0xc0/0x114) from
[<c01841cc>] (cpu_stopper_thread+0xd8/0x164)
[   34.247266] [<c01841cc>] (cpu_stopper_thread+0xd8/0x164) from
[<c0145c14>] (kthread+0xc8/0xd8)
[   34.247323] [<c0145c14>] (kthread+0xc8/0xd8) from [<c0106118>]
(ret_from_fork+0x14/0x20)

Using local_flush_tlb_kernel_range() fixed it though. Thank you for
your help on this! :)

-Kees

Jon Medhurst (Tixy) April 8, 2014, 12:41 p.m. UTC | #2

On Fri, 2014-04-04 at 17:07 -0700, Kees Cook wrote:
> On Fri, Apr 4, 2014 at 12:58 PM, Rabin Vincent <rabin@rab.in> wrote:
[...]
> > You need a TLB flush.  I had a flush_tlb_all() in my example patch,
> > http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/244335.html,
> > but the following is probably nicer (on top of this patch):
> >
> > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> > index 9bea524..a92c45a 100644
> > --- a/arch/arm/mm/init.c
> > +++ b/arch/arm/mm/init.c
> > @@ -741,6 +741,8 @@ static inline bool arch_has_strict_perms(void)
> >                      addr += SECTION_SIZE)                              \
> >                         section_update(addr, perms[i].mask,             \
> >                                        perms[i].field);                 \
> > +                                                                       \
> > +               flush_tlb_kernel_range(perms[i].start, perms[i].end);   \
> >         }                                                               \
> >  }
> >
> 
> When I do this, I hang the system, and get a WARN due to the tlb call
> attempting to flush on all CPUs, I think:
> 
> [   34.246034] WARNING: at
> /mnt/host/source/src/third_party/kernel-next/kernel/smp.c:466
> smp_call_function_many+0xac/0x26c()
> ...
> [   34.246617] Backtrace:
> [   34.246697] [<c010d3b8>] (unwind_backtrace+0x0/0x118) from
> [<c060b9d8>] (dump_stack+0x28/0x30)
> [   34.246765] [<c060b9d8>] (dump_stack+0x28/0x30) from [<c0123044>]
> (warn_slowpath_null+0x44/0x5c)
> [   34.246824] [<c0123044>] (warn_slowpath_null+0x44/0x5c) from
> [<c017426c>] (smp_call_function_many+0xac/0x26c)
> [   34.246881] [<c017426c>] (smp_call_function_many+0xac/0x26c) from
> [<c0174468>] (smp_call_function+0x3c/0x48)
> [   34.246937] [<c0174468>] (smp_call_function+0x3c/0x48) from
> [<c010c0fc>] (broadcast_tlb_a15_erratum+0x40/0x4c)
> [   34.246994] [<c010c0fc>] (broadcast_tlb_a15_erratum+0x40/0x4c) from
> [<c010c590>] (flush_tlb_kernel_range+0x74/0xa0)
> [   34.247046] [<c010c590>] (flush_tlb_kernel_range+0x74/0xa0) from
> [<c011403c>] (set_kernel_text_rw+0xd8/0xec)
> [   34.247099] [<c011403c>] (set_kernel_text_rw+0xd8/0xec) from
> [<c010c878>] (__ftrace_modify_code+0x14/0x28)
> [   34.247156] [<c010c878>] (__ftrace_modify_code+0x14/0x28) from
> [<c0184318>] (stop_machine_cpu_stop+0xc0/0x114)
> [   34.247212] [<c0184318>] (stop_machine_cpu_stop+0xc0/0x114) from
> [<c01841cc>] (cpu_stopper_thread+0xd8/0x164)
> [   34.247266] [<c01841cc>] (cpu_stopper_thread+0xd8/0x164) from
> [<c0145c14>] (kthread+0xc8/0xd8)
> [   34.247323] [<c0145c14>] (kthread+0xc8/0xd8) from [<c0106118>]
> (ret_from_fork+0x14/0x20)
> 
> Using local_flush_tlb_kernel_range() fixed it though.

What about if another CPU had a TLB entry with the old permissions in?
Or do you consider that the likelihood and consequences of that aren't
significant?

Kees Cook April 8, 2014, 4:01 p.m. UTC | #3

On Tue, Apr 8, 2014 at 5:41 AM, Jon Medhurst (Tixy) <tixy@linaro.org> wrote:
> On Fri, 2014-04-04 at 17:07 -0700, Kees Cook wrote:
>> On Fri, Apr 4, 2014 at 12:58 PM, Rabin Vincent <rabin@rab.in> wrote:
> [...]
>> > You need a TLB flush.  I had a flush_tlb_all() in my example patch,
>> > http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/244335.html,
>> > but the following is probably nicer (on top of this patch):
>> >
>> > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
>> > index 9bea524..a92c45a 100644
>> > --- a/arch/arm/mm/init.c
>> > +++ b/arch/arm/mm/init.c
>> > @@ -741,6 +741,8 @@ static inline bool arch_has_strict_perms(void)
>> >                      addr += SECTION_SIZE)                              \
>> >                         section_update(addr, perms[i].mask,             \
>> >                                        perms[i].field);                 \
>> > +                                                                       \
>> > +               flush_tlb_kernel_range(perms[i].start, perms[i].end);   \
>> >         }                                                               \
>> >  }
>> >
>>
>> When I do this, I hang the system, and get a WARN due to the tlb call
>> attempting to flush on all CPUs, I think:
>>
>> [   34.246034] WARNING: at
>> /mnt/host/source/src/third_party/kernel-next/kernel/smp.c:466
>> smp_call_function_many+0xac/0x26c()
>> ...
>> [   34.246617] Backtrace:
>> [   34.246697] [<c010d3b8>] (unwind_backtrace+0x0/0x118) from
>> [<c060b9d8>] (dump_stack+0x28/0x30)
>> [   34.246765] [<c060b9d8>] (dump_stack+0x28/0x30) from [<c0123044>]
>> (warn_slowpath_null+0x44/0x5c)
>> [   34.246824] [<c0123044>] (warn_slowpath_null+0x44/0x5c) from
>> [<c017426c>] (smp_call_function_many+0xac/0x26c)
>> [   34.246881] [<c017426c>] (smp_call_function_many+0xac/0x26c) from
>> [<c0174468>] (smp_call_function+0x3c/0x48)
>> [   34.246937] [<c0174468>] (smp_call_function+0x3c/0x48) from
>> [<c010c0fc>] (broadcast_tlb_a15_erratum+0x40/0x4c)
>> [   34.246994] [<c010c0fc>] (broadcast_tlb_a15_erratum+0x40/0x4c) from
>> [<c010c590>] (flush_tlb_kernel_range+0x74/0xa0)
>> [   34.247046] [<c010c590>] (flush_tlb_kernel_range+0x74/0xa0) from
>> [<c011403c>] (set_kernel_text_rw+0xd8/0xec)
>> [   34.247099] [<c011403c>] (set_kernel_text_rw+0xd8/0xec) from
>> [<c010c878>] (__ftrace_modify_code+0x14/0x28)
>> [   34.247156] [<c010c878>] (__ftrace_modify_code+0x14/0x28) from
>> [<c0184318>] (stop_machine_cpu_stop+0xc0/0x114)
>> [   34.247212] [<c0184318>] (stop_machine_cpu_stop+0xc0/0x114) from
>> [<c01841cc>] (cpu_stopper_thread+0xd8/0x164)
>> [   34.247266] [<c01841cc>] (cpu_stopper_thread+0xd8/0x164) from
>> [<c0145c14>] (kthread+0xc8/0xd8)
>> [   34.247323] [<c0145c14>] (kthread+0xc8/0xd8) from [<c0106118>]
>> (ret_from_fork+0x14/0x20)
>>
>> Using local_flush_tlb_kernel_range() fixed it though.
>
> What about if another CPU had a TLB entry with the old permissions in?
> Or do you consider that the likelihood and consequences of that aren't
> significant?

The purpose of the function is to temporarily make text writable, do
the write, and then restore read-only. Since only the writer needs to
care about TLB state, this works fine. It's actually nice that only
the current CPU can make text writes.

-Kees

Jon Medhurst (Tixy) April 8, 2014, 4:12 p.m. UTC | #4

On Tue, 2014-04-08 at 09:01 -0700, Kees Cook wrote:
> On Tue, Apr 8, 2014 at 5:41 AM, Jon Medhurst (Tixy) <tixy@linaro.org> wrote:
> > On Fri, 2014-04-04 at 17:07 -0700, Kees Cook wrote:
> >> On Fri, Apr 4, 2014 at 12:58 PM, Rabin Vincent <rabin@rab.in> wrote:
> > [...]
> >> > You need a TLB flush.  I had a flush_tlb_all() in my example patch,
> >> > http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/244335.html,
> >> > but the following is probably nicer (on top of this patch):
> >> >
> >> > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> >> > index 9bea524..a92c45a 100644
> >> > --- a/arch/arm/mm/init.c
> >> > +++ b/arch/arm/mm/init.c
> >> > @@ -741,6 +741,8 @@ static inline bool arch_has_strict_perms(void)
> >> >                      addr += SECTION_SIZE)                              \
> >> >                         section_update(addr, perms[i].mask,             \
> >> >                                        perms[i].field);                 \
> >> > +                                                                       \
> >> > +               flush_tlb_kernel_range(perms[i].start, perms[i].end);   \
> >> >         }                                                               \
> >> >  }
> >> >
> >>
> >> When I do this, I hang the system, and get a WARN due to the tlb call
> >> attempting to flush on all CPUs, I think:
> >>
> >> [   34.246034] WARNING: at
> >> /mnt/host/source/src/third_party/kernel-next/kernel/smp.c:466
> >> smp_call_function_many+0xac/0x26c()
> >> ...
> >> [   34.246617] Backtrace:
> >> [   34.246697] [<c010d3b8>] (unwind_backtrace+0x0/0x118) from
> >> [<c060b9d8>] (dump_stack+0x28/0x30)
> >> [   34.246765] [<c060b9d8>] (dump_stack+0x28/0x30) from [<c0123044>]
> >> (warn_slowpath_null+0x44/0x5c)
> >> [   34.246824] [<c0123044>] (warn_slowpath_null+0x44/0x5c) from
> >> [<c017426c>] (smp_call_function_many+0xac/0x26c)
> >> [   34.246881] [<c017426c>] (smp_call_function_many+0xac/0x26c) from
> >> [<c0174468>] (smp_call_function+0x3c/0x48)
> >> [   34.246937] [<c0174468>] (smp_call_function+0x3c/0x48) from
> >> [<c010c0fc>] (broadcast_tlb_a15_erratum+0x40/0x4c)
> >> [   34.246994] [<c010c0fc>] (broadcast_tlb_a15_erratum+0x40/0x4c) from
> >> [<c010c590>] (flush_tlb_kernel_range+0x74/0xa0)
> >> [   34.247046] [<c010c590>] (flush_tlb_kernel_range+0x74/0xa0) from
> >> [<c011403c>] (set_kernel_text_rw+0xd8/0xec)
> >> [   34.247099] [<c011403c>] (set_kernel_text_rw+0xd8/0xec) from
> >> [<c010c878>] (__ftrace_modify_code+0x14/0x28)
> >> [   34.247156] [<c010c878>] (__ftrace_modify_code+0x14/0x28) from
> >> [<c0184318>] (stop_machine_cpu_stop+0xc0/0x114)
> >> [   34.247212] [<c0184318>] (stop_machine_cpu_stop+0xc0/0x114) from
> >> [<c01841cc>] (cpu_stopper_thread+0xd8/0x164)
> >> [   34.247266] [<c01841cc>] (cpu_stopper_thread+0xd8/0x164) from
> >> [<c0145c14>] (kthread+0xc8/0xd8)
> >> [   34.247323] [<c0145c14>] (kthread+0xc8/0xd8) from [<c0106118>]
> >> (ret_from_fork+0x14/0x20)
> >>
> >> Using local_flush_tlb_kernel_range() fixed it though.
> >
> > What about if another CPU had a TLB entry with the old permissions in?
> > Or do you consider that the likelihood and consequences of that aren't
> > significant?
> 
> The purpose of the function is to temporarily make text writable, do
> the write, and then restore read-only. Since only the writer needs to
> care about TLB state, this works fine. It's actually nice that only
> the current CPU can make text writes.

And is the page table being modified unique to the current CPU? I
thought a common set of page tables was shared across all of them. If
that is the case then one CPU can modify the PTE to be writeable,
another CPU take a TLB miss and pull in that writeable entry, which will
stay there until it drops out the TLB at some indefinite point in the
future. That's the scenario I was getting at with my previous comment.

Kees Cook April 8, 2014, 4:59 p.m. UTC | #5

On Tue, Apr 8, 2014 at 9:12 AM, Jon Medhurst (Tixy) <tixy@linaro.org> wrote:
> On Tue, 2014-04-08 at 09:01 -0700, Kees Cook wrote:
>> On Tue, Apr 8, 2014 at 5:41 AM, Jon Medhurst (Tixy) <tixy@linaro.org> wrote:
>> > On Fri, 2014-04-04 at 17:07 -0700, Kees Cook wrote:
>> >> On Fri, Apr 4, 2014 at 12:58 PM, Rabin Vincent <rabin@rab.in> wrote:
>> > [...]
>> >> > You need a TLB flush.  I had a flush_tlb_all() in my example patch,
>> >> > http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/244335.html,
>> >> > but the following is probably nicer (on top of this patch):
>> >> >
>> >> > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
>> >> > index 9bea524..a92c45a 100644
>> >> > --- a/arch/arm/mm/init.c
>> >> > +++ b/arch/arm/mm/init.c
>> >> > @@ -741,6 +741,8 @@ static inline bool arch_has_strict_perms(void)
>> >> >                      addr += SECTION_SIZE)                              \
>> >> >                         section_update(addr, perms[i].mask,             \
>> >> >                                        perms[i].field);                 \
>> >> > +                                                                       \
>> >> > +               flush_tlb_kernel_range(perms[i].start, perms[i].end);   \
>> >> >         }                                                               \
>> >> >  }
>> >> >
>> >>
>> >> When I do this, I hang the system, and get a WARN due to the tlb call
>> >> attempting to flush on all CPUs, I think:
>> >>
>> >> [   34.246034] WARNING: at
>> >> /mnt/host/source/src/third_party/kernel-next/kernel/smp.c:466
>> >> smp_call_function_many+0xac/0x26c()
>> >> ...
>> >> [   34.246617] Backtrace:
>> >> [   34.246697] [<c010d3b8>] (unwind_backtrace+0x0/0x118) from
>> >> [<c060b9d8>] (dump_stack+0x28/0x30)
>> >> [   34.246765] [<c060b9d8>] (dump_stack+0x28/0x30) from [<c0123044>]
>> >> (warn_slowpath_null+0x44/0x5c)
>> >> [   34.246824] [<c0123044>] (warn_slowpath_null+0x44/0x5c) from
>> >> [<c017426c>] (smp_call_function_many+0xac/0x26c)
>> >> [   34.246881] [<c017426c>] (smp_call_function_many+0xac/0x26c) from
>> >> [<c0174468>] (smp_call_function+0x3c/0x48)
>> >> [   34.246937] [<c0174468>] (smp_call_function+0x3c/0x48) from
>> >> [<c010c0fc>] (broadcast_tlb_a15_erratum+0x40/0x4c)
>> >> [   34.246994] [<c010c0fc>] (broadcast_tlb_a15_erratum+0x40/0x4c) from
>> >> [<c010c590>] (flush_tlb_kernel_range+0x74/0xa0)
>> >> [   34.247046] [<c010c590>] (flush_tlb_kernel_range+0x74/0xa0) from
>> >> [<c011403c>] (set_kernel_text_rw+0xd8/0xec)
>> >> [   34.247099] [<c011403c>] (set_kernel_text_rw+0xd8/0xec) from
>> >> [<c010c878>] (__ftrace_modify_code+0x14/0x28)
>> >> [   34.247156] [<c010c878>] (__ftrace_modify_code+0x14/0x28) from
>> >> [<c0184318>] (stop_machine_cpu_stop+0xc0/0x114)
>> >> [   34.247212] [<c0184318>] (stop_machine_cpu_stop+0xc0/0x114) from
>> >> [<c01841cc>] (cpu_stopper_thread+0xd8/0x164)
>> >> [   34.247266] [<c01841cc>] (cpu_stopper_thread+0xd8/0x164) from
>> >> [<c0145c14>] (kthread+0xc8/0xd8)
>> >> [   34.247323] [<c0145c14>] (kthread+0xc8/0xd8) from [<c0106118>]
>> >> (ret_from_fork+0x14/0x20)
>> >>
>> >> Using local_flush_tlb_kernel_range() fixed it though.
>> >
>> > What about if another CPU had a TLB entry with the old permissions in?
>> > Or do you consider that the likelihood and consequences of that aren't
>> > significant?
>>
>> The purpose of the function is to temporarily make text writable, do
>> the write, and then restore read-only. Since only the writer needs to
>> care about TLB state, this works fine. It's actually nice that only
>> the current CPU can make text writes.
>
> And is the page table being modified unique to the current CPU? I
> thought a common set of page tables was shared across all of them. If
> that is the case then one CPU can modify the PTE to be writeable,
> another CPU take a TLB miss and pull in that writeable entry, which will
> stay there until it drops out the TLB at some indefinite point in the
> future. That's the scenario I was getting at with my previous comment.

As I understood it, this would be true for small PTEs, but sections
are fully duplicated on each CPU so we don't run that risk. This was
the whole source of my problem with this patch series: even a full
all-CPU TLB flush wasn't working -- the section permissions were
unique to the CPU since the entries were duplicated.

-Kees

Rabin Vincent April 8, 2014, 7:48 p.m. UTC | #6

On Tue, Apr 08, 2014 at 09:59:07AM -0700, Kees Cook wrote:
> On Tue, Apr 8, 2014 at 9:12 AM, Jon Medhurst (Tixy) <tixy@linaro.org> wrote:
> > And is the page table being modified unique to the current CPU? I
> > thought a common set of page tables was shared across all of them. If
> > that is the case then one CPU can modify the PTE to be writeable,
> > another CPU take a TLB miss and pull in that writeable entry, which will
> > stay there until it drops out the TLB at some indefinite point in the
> > future. That's the scenario I was getting at with my previous comment.
> 
> As I understood it, this would be true for small PTEs, but sections
> are fully duplicated on each CPU so we don't run that risk. This was
> the whole source of my problem with this patch series: even a full
> all-CPU TLB flush wasn't working -- the section permissions were
> unique to the CPU since the entries were duplicated.

The PGD is per-mm_struct.  mm_structs can be shared between processes.
So the PGD is not per CPU.

This set_kernel_text_rw() is called from ftrace in stop_machine() on one
CPU.  All other CPUs will be spinning in kernel threads inside the loop
in multi_cpu_stop(), with interrupts disabled.  Since kernel threads use
the last process' mm, it is possible for the other CPU(s) to be
currently using the same mm as the modifying CPU.

For any other CPU to pull in the writable entry it would have to get a
TLB miss inside the loop in multi_cpu_stop(), after the state transition
to MULTI_STOP_RUN and before the state transition to MULTI_STOP_EXIT.
This is unlikely, but theoretically possible, for example if
multi_cpu_stop() straddles sections.

To prevent any stale entries being used indefinitely, perhaps the all
CPU TLB flush can be inserted into
ftrace_arch_code_modify_post_process(), which is called after the
stop_machine() and which is where x86 for example makes the entries
read-only again.

Jon Medhurst (Tixy) April 9, 2014, 10:29 a.m. UTC | #7

On Tue, 2014-04-08 at 21:48 +0200, Rabin Vincent wrote:
[...]
> For any other CPU to pull in the writable entry it would have to get a
> TLB miss inside the loop in multi_cpu_stop(), after the state transition
> to MULTI_STOP_RUN and before the state transition to MULTI_STOP_EXIT.
> This is unlikely, but theoretically possible, for example if
> multi_cpu_stop() straddles sections.

With speculative execution it is also possible for the CPU to fill the
TLB with entries for a memory address that the program would never
actually access. Basically, whatever is in the MMU registers and page
tables at any given time, the CPU can speculatively use that address
translation and read that memory. And if it's marked cacheable, pull it
into the cache. Oh, and if there is a dirty cacheline in another
CPU/clusters cache, move that dirty entry over into it's own cache (I
believe).

[2/2] ARM: mm: make text and rodata read-only

Commit Message

Comments

Patch