diff mbox

[RFC,3/7] module: modify memory attrs for __ro_mostly_after_init during module_init/exit

Message ID 1487498660-16600-3-git-send-email-hoeun.ryu@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Hoeun Ryu Feb. 19, 2017, 10:04 a.m. UTC
`__ro_mostly_after_init` is almost like `__ro_after_init`. The section is
read-only as same as `__ro_after_init` after kernel init. This patch makes
`__ro_mostly_after_init` section read-write temporarily only during
module_init/module_exit.

Signed-off-by: Hoeun Ryu <hoeun.ryu@gmail.com>
---
 kernel/module.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Mark Rutland Feb. 20, 2017, 10:30 a.m. UTC | #1
On Sun, Feb 19, 2017 at 07:04:06PM +0900, Hoeun Ryu wrote:
>  `__ro_mostly_after_init` is almost like `__ro_after_init`. The section is
> read-only as same as `__ro_after_init` after kernel init. This patch makes
> `__ro_mostly_after_init` section read-write temporarily only during
> module_init/module_exit.
> 
> Signed-off-by: Hoeun Ryu <hoeun.ryu@gmail.com>
> ---
>  kernel/module.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 7eba6de..3b25e0e 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -987,8 +987,11 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
>  
>  	mutex_unlock(&module_mutex);
>  	/* Final destruction now no one is using it. */
> -	if (mod->exit != NULL)
> +	if (mod->exit != NULL) {
> +		set_ro_mostly_after_init_rw();
>  		mod->exit();
> +		set_ro_mostly_after_init_ro();
> +	}
>  	blocking_notifier_call_chain(&module_notify_list,
>  				     MODULE_STATE_GOING, mod);
>  	klp_module_going(mod);
> @@ -3396,8 +3399,11 @@ static noinline int do_init_module(struct module *mod)
>  
>  	do_mod_ctors(mod);
>  	/* Start the module */
> -	if (mod->init != NULL)
> +	if (mod->init != NULL) {
> +		set_ro_mostly_after_init_rw();
>  		ret = do_one_initcall(mod->init);
> +		set_ro_mostly_after_init_ro();
> +	}

This looks very much like the pax_{open,close}_kernel() approach for
write-rarely data.

I think it would be better to implement a first class write-rarely
mechanism rather than trying to extend __ro_after_init to cover this
case.

As mentioned previously, I *think* we can have a generic implementation
that uses an mm to temporarily map a (thread/cpu-local) RW alias of the
data in question in what would otherwise be the user half of the address
space. Regardless, we can have a generic interface [1] that can cater
for that style of approach and/or something like ARM's domains or x86's
pkeys.

Thanks,
Mark.

[1] http://www.openwall.com/lists/kernel-hardening/2016/11/18/3
Hoeun Ryu Feb. 21, 2017, 1:36 p.m. UTC | #2
> On 20 Feb 2017, at 7:30 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> 
> On Sun, Feb 19, 2017 at 07:04:06PM +0900, Hoeun Ryu wrote:
>> `__ro_mostly_after_init` is almost like `__ro_after_init`. The section is
>> read-only as same as `__ro_after_init` after kernel init. This patch makes
>> `__ro_mostly_after_init` section read-write temporarily only during
>> module_init/module_exit.
>> 
>> Signed-off-by: Hoeun Ryu <hoeun.ryu@gmail.com>
>> ---
>> kernel/module.c | 10 ++++++++--
>> 1 file changed, 8 insertions(+), 2 deletions(-)
>> 
>> diff --git a/kernel/module.c b/kernel/module.c
>> index 7eba6de..3b25e0e 100644
>> --- a/kernel/module.c
>> +++ b/kernel/module.c
>> @@ -987,8 +987,11 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
>> 
>> 	mutex_unlock(&module_mutex);
>> 	/* Final destruction now no one is using it. */
>> -	if (mod->exit != NULL)
>> +	if (mod->exit != NULL) {
>> +		set_ro_mostly_after_init_rw();
>> 		mod->exit();
>> +		set_ro_mostly_after_init_ro();
>> +	}
>> 	blocking_notifier_call_chain(&module_notify_list,
>> 				     MODULE_STATE_GOING, mod);
>> 	klp_module_going(mod);
>> @@ -3396,8 +3399,11 @@ static noinline int do_init_module(struct module *mod)
>> 
>> 	do_mod_ctors(mod);
>> 	/* Start the module */
>> -	if (mod->init != NULL)
>> +	if (mod->init != NULL) {
>> +		set_ro_mostly_after_init_rw();
>> 		ret = do_one_initcall(mod->init);
>> +		set_ro_mostly_after_init_ro();
>> +	}
> 
> This looks very much like the pax_{open,close}_kernel() approach for
> write-rarely data.

I read the discussion [1] and I agree that __ro_mostly_after_init marker
looks very similar to __write_rarely. 

> 
> I think it would be better to implement a first class write-rarely
> mechanism rather than trying to extend __ro_after_init to cover this
> case.

I’m not extending __ro_after_init. __ro_mostly_after_init resides in the same section of rodata though.

> 
> As mentioned previously, I *think* we can have a generic implementation
> that uses an mm to temporarily map a (thread/cpu-local) RW alias of the
> data in question in what would otherwise be the user half of the address
> space. Regardless, we can have a generic interface [1] that can cater
> for that style of approach and/or something like ARM's domains or x86's
> pkeys.
> 

I’m still learning cpu/kernel architectures, It would be very thankful if you tell me more about the detail of the implementation itself.

The mm that maps temporary RW alias is like
    * special mm like idmap/init_mm which have its own page tables?
    * the page tables have the same content of page tables of init_mm’s swapper_pg_dir except for RW permissions for a specific section (let’s say __write_rarely)
    * then use switch_mm(special_rw_mm) to change the address space before the access happens to the section
    * then use switch_mm(current->mm) to change the address space to original after the access is done

And the interface itself. rare_write(__val, __val), is it a single value access interface.
I’m intending to make data in __ro_mostly_after_init section RW during multiple accesses like during module_init/exit.
and __rare_rw_map()/unmap() used in rare_write() seems to work like open/close api.

How could __rare_rw_ptr() be implemented and what happens when `__rw_var = __rare_rw_ptr(&(__var))` is done ?

However the interface will look like, Do we still need a special data section that is mapped RO in general but RW in some cases ?
if then, doesn’t __ro_mostly_after_init marker itself make sense and we still need it ?

> Thanks,
> Mark.
> 
> [1] http://www.openwall.com/lists/kernel-hardening/2016/11/18/3
Mark Rutland Feb. 21, 2017, 1:58 p.m. UTC | #3
On Tue, Feb 21, 2017 at 10:36:05PM +0900, Ho-Eun Ryu wrote:
> > On 20 Feb 2017, at 7:30 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Sun, Feb 19, 2017 at 07:04:06PM +0900, Hoeun Ryu wrote:

> >> @@ -3396,8 +3399,11 @@ static noinline int do_init_module(struct module *mod)
> >> 
> >> 	do_mod_ctors(mod);
> >> 	/* Start the module */
> >> -	if (mod->init != NULL)
> >> +	if (mod->init != NULL) {
> >> +		set_ro_mostly_after_init_rw();
> >> 		ret = do_one_initcall(mod->init);
> >> +		set_ro_mostly_after_init_ro();
> >> +	}
> > 
> > This looks very much like the pax_{open,close}_kernel() approach for
> > write-rarely data.
> 
> I read the discussion [1] and I agree that __ro_mostly_after_init marker
> looks very similar to __write_rarely. 
> 
> > I think it would be better to implement a first class write-rarely
> > mechanism rather than trying to extend __ro_after_init to cover this
> > case.
> 
> I’m not extending __ro_after_init. __ro_mostly_after_init resides in
> the same section of rodata though.

Sorry; I was confused when I wrote that email. I now understand that
you're adding a separate annotation.

> > As mentioned previously, I *think* we can have a generic implementation
> > that uses an mm to temporarily map a (thread/cpu-local) RW alias of the
> > data in question in what would otherwise be the user half of the address
> > space. Regardless, we can have a generic interface [1] that can cater
> > for that style of approach and/or something like ARM's domains or x86's
> > pkeys.
> > 
> 
> I’m still learning cpu/kernel architectures, It would be very thankful if you tell me more about the detail of the implementation itself.
> 
> The mm that maps temporary RW alias is like
>     * special mm like idmap/init_mm which have its own page tables?
>     * the page tables have the same content of page tables of
>       init_mm’s swapper_pg_dir except for RW permissions for a
>       specific section (let’s say __write_rarely)

This would be a special mm, like a user mm, that only mapped the
relevant VA(s).

That might map the relevant variable on-demand, or the mapping could
cover the whole write_rarely area.

>     * then use switch_mm(special_rw_mm) to change the address space
>       before the access happens to the section
>     * then use switch_mm(current->mm) to change the address space to
>       original after the access is done

Yes.

> And the interface itself. rare_write(__val, __val), is it a single
> value access interface.
> I’m intending to make data in __ro_mostly_after_init section RW during
> multiple accesses like during module_init/exit.
> and __rare_rw_map()/unmap() used in rare_write() seems to work like
> open/close api.

The __rare_rw_{map,unmap}() functions would map in the RW alias, but do
not necessarily change the RO alias to RW. This is why __rare_rw_ptr()
would be necessary, and is the major difference to the open/close API.

We could certainly allow several writes between a map/unmap. The key
requirement is that each write is instrumented so that it goes via the
RW alias.

> How could __rare_rw_ptr() be implemented and what happens when
> `__rw_var = __rare_rw_ptr(&(__var))` is done ?

__rare_rw_ptr() would take a pointer to the usual RO alias, and derive
its RW alias. What exactly this should do depends on how the RW alias is
implemented.

On a system using an RW mm, let's assume we place all __write_rarely
variables in a region bounded by __rare_write_begin/__rare_write_end,
and when the mm is installed place, we have an RW alias of this region
beginning at __rw_alias_start. In this case, it'd look something like:

#define __rare_rw_ptr(ptr) ({				\
	unsigned long __ptr = (unsigned long)(ptr);	\
	__ptr -= __rare_write_start;			\
	__ptr += __rw_alias_start;			\
	(typeof(ptr))__ptr;				\
})

... does that make sense?

For systems where you can freely/easily alter (local) permissions (e.g.
using ARM's domains), that can be done within __rare_rw_{map,unmap}(),
and __rare_rw_ptr can just return the original pointer.

> However the interface will look like, Do we still need a special data
> section that is mapped RO in general but RW in some cases ?

With the above, I think the usual mapping can always be RO.

> if then, doesn’t __ro_mostly_after_init marker itself make sense and
> we still need it ?

We may need a marker to bound the set of variables we wish to map in
this way.

Thanks,
Mark.
Hoeun Ryu Feb. 22, 2017, 1:45 p.m. UTC | #4
Thank you for your detailed explanation. It helped a lot for understandings.

> On Feb 21, 2017, at 10:58 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> 
> On Tue, Feb 21, 2017 at 10:36:05PM +0900, Ho-Eun Ryu wrote:
>>> On 20 Feb 2017, at 7:30 PM, Mark Rutland <mark.rutland@arm.com> wrote:
>>> On Sun, Feb 19, 2017 at 07:04:06PM +0900, Hoeun Ryu wrote:
> 
>>>> @@ -3396,8 +3399,11 @@ static noinline int do_init_module(struct module *mod)
>>>> 
>>>>   do_mod_ctors(mod);
>>>>   /* Start the module */
>>>> -    if (mod->init != NULL)
>>>> +    if (mod->init != NULL) {
>>>> +        set_ro_mostly_after_init_rw();
>>>>       ret = do_one_initcall(mod->init);
>>>> +        set_ro_mostly_after_init_ro();
>>>> +    }
>>> 
>>> This looks very much like the pax_{open,close}_kernel() approach for
>>> write-rarely data.
>> 
>> I read the discussion [1] and I agree that __ro_mostly_after_init marker
>> looks very similar to __write_rarely. 
>> 
>>> I think it would be better to implement a first class write-rarely
>>> mechanism rather than trying to extend __ro_after_init to cover this
>>> case.
>> 
>> I’m not extending __ro_after_init. __ro_mostly_after_init resides in
>> the same section of rodata though.
> 
> Sorry; I was confused when I wrote that email. I now understand that
> you're adding a separate annotation.
> 
>>> As mentioned previously, I *think* we can have a generic implementation
>>> that uses an mm to temporarily map a (thread/cpu-local) RW alias of the
>>> data in question in what would otherwise be the user half of the address
>>> space. Regardless, we can have a generic interface [1] that can cater
>>> for that style of approach and/or something like ARM's domains or x86's
>>> pkeys.
>> 
>> I’m still learning cpu/kernel architectures, It would be very thankful if you tell me more about the detail of the implementation itself.
>> 
>> The mm that maps temporary RW alias is like
>>   * special mm like idmap/init_mm which have its own page tables?
>>   * the page tables have the same content of page tables of
>>     init_mm’s swapper_pg_dir except for RW permissions for a
>>     specific section (let’s say __write_rarely)
> 
> This would be a special mm, like a user mm, that only mapped the
> relevant VA(s).

we need a separate mm/pgd for ttbr0_el1 in kernel image section as idmap and swapper_pg_dir currently do and we make VA alias mapping for RO section with RW permission under TASK_SIZE during kernel init. And then we can switch to the mm by setting the pgd to ttbr0_el1. Right ?

It came to my mind that how about the relationship with SW_TTBR0_PAN .
What if copy_from_user tries to do something against RW alias ?

val_rw = __rw_ptr(&val);
__rw_map();
copy_from_user(&val_rw, user_ptr);
__re_unmap();

__rw_map() will install rw_mm->gpd to ttbr0_el1 but uaccess_enable() will immediately reinstall thread_info->pgd to ttbr0_el1 and we loose RW alias.
Am I something wrong or confused ?

> 
> That might map the relevant variable on-demand, or the mapping could
> cover the whole write_rarely area.
> 
>>   * then use switch_mm(special_rw_mm) to change the address space
>>     before the access happens to the section
>>   * then use switch_mm(current->mm) to change the address space to
>>     original after the access is done
> 
> Yes.
> 
>> And the interface itself. rare_write(__val, __val), is it a single
>> value access interface.
>> I’m intending to make data in __ro_mostly_after_init section RW during
>> multiple accesses like during module_init/exit.
>> and __rare_rw_map()/unmap() used in rare_write() seems to work like
>> open/close api.
> 
> The __rare_rw_{map,unmap}() functions would map in the RW alias, but do
> not necessarily change the RO alias to RW. This is why __rare_rw_ptr()
> would be necessary, and is the major difference to the open/close API.
> 
> We could certainly allow several writes between a map/unmap. The key
> requirement is that each write is instrumented so that it goes via the
> RW alias.
> 
>> How could __rare_rw_ptr() be implemented and what happens when
>> `__rw_var = __rare_rw_ptr(&(__var))` is done ?
> 
> __rare_rw_ptr() would take a pointer to the usual RO alias, and derive
> its RW alias. What exactly this should do depends on how the RW alias is
> implemented.
> 
> On a system using an RW mm, let's assume we place all __write_rarely
> variables in a region bounded by __rare_write_begin/__rare_write_end,
> and when the mm is installed place, we have an RW alias of this region
> beginning at __rw_alias_start. In this case, it'd look something like:
> 
> #define __rare_rw_ptr(ptr) ({                \
>   unsigned long __ptr = (unsigned long)(ptr);    \
>   __ptr -= __rare_write_start;            \
>   __ptr += __rw_alias_start;            \
>   (typeof(ptr))__ptr;                \
> })
> 
> ... does that make sense?

Yes. Cool.

> 
> For systems where you can freely/easily alter (local) permissions (e.g.
> using ARM's domains), that can be done within __rare_rw_{map,unmap}(),
> and __rare_rw_ptr can just return the original pointer.
> 
>> However the interface will look like, Do we still need a special data
>> section that is mapped RO in general but RW in some cases ?
> 
> With the above, I think the usual mapping can always be RO.
> 
>> if then, doesn’t __ro_mostly_after_init marker itself make sense and
>> we still need it ?
> 
> We may need a marker to bound the set of variables we wish to map in
> this way.
> 
> Thanks,
> Mark.
diff mbox

Patch

diff --git a/kernel/module.c b/kernel/module.c
index 7eba6de..3b25e0e 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -987,8 +987,11 @@  SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
 
 	mutex_unlock(&module_mutex);
 	/* Final destruction now no one is using it. */
-	if (mod->exit != NULL)
+	if (mod->exit != NULL) {
+		set_ro_mostly_after_init_rw();
 		mod->exit();
+		set_ro_mostly_after_init_ro();
+	}
 	blocking_notifier_call_chain(&module_notify_list,
 				     MODULE_STATE_GOING, mod);
 	klp_module_going(mod);
@@ -3396,8 +3399,11 @@  static noinline int do_init_module(struct module *mod)
 
 	do_mod_ctors(mod);
 	/* Start the module */
-	if (mod->init != NULL)
+	if (mod->init != NULL) {
+		set_ro_mostly_after_init_rw();
 		ret = do_one_initcall(mod->init);
+		set_ro_mostly_after_init_ro();
+	}
 	if (ret < 0) {
 		goto fail_free_freeinit;
 	}