Message ID | 1487498660-16600-3-git-send-email-hoeun.ryu@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sun, Feb 19, 2017 at 07:04:06PM +0900, Hoeun Ryu wrote: > `__ro_mostly_after_init` is almost like `__ro_after_init`. The section is > read-only as same as `__ro_after_init` after kernel init. This patch makes > `__ro_mostly_after_init` section read-write temporarily only during > module_init/module_exit. > > Signed-off-by: Hoeun Ryu <hoeun.ryu@gmail.com> > --- > kernel/module.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/kernel/module.c b/kernel/module.c > index 7eba6de..3b25e0e 100644 > --- a/kernel/module.c > +++ b/kernel/module.c > @@ -987,8 +987,11 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user, > > mutex_unlock(&module_mutex); > /* Final destruction now no one is using it. */ > - if (mod->exit != NULL) > + if (mod->exit != NULL) { > + set_ro_mostly_after_init_rw(); > mod->exit(); > + set_ro_mostly_after_init_ro(); > + } > blocking_notifier_call_chain(&module_notify_list, > MODULE_STATE_GOING, mod); > klp_module_going(mod); > @@ -3396,8 +3399,11 @@ static noinline int do_init_module(struct module *mod) > > do_mod_ctors(mod); > /* Start the module */ > - if (mod->init != NULL) > + if (mod->init != NULL) { > + set_ro_mostly_after_init_rw(); > ret = do_one_initcall(mod->init); > + set_ro_mostly_after_init_ro(); > + } This looks very much like the pax_{open,close}_kernel() approach for write-rarely data. I think it would be better to implement a first class write-rarely mechanism rather than trying to extend __ro_after_init to cover this case. As mentioned previously, I *think* we can have a generic implementation that uses an mm to temporarily map a (thread/cpu-local) RW alias of the data in question in what would otherwise be the user half of the address space. Regardless, we can have a generic interface [1] that can cater for that style of approach and/or something like ARM's domains or x86's pkeys. Thanks, Mark. [1] http://www.openwall.com/lists/kernel-hardening/2016/11/18/3
> On 20 Feb 2017, at 7:30 PM, Mark Rutland <mark.rutland@arm.com> wrote: > > On Sun, Feb 19, 2017 at 07:04:06PM +0900, Hoeun Ryu wrote: >> `__ro_mostly_after_init` is almost like `__ro_after_init`. The section is >> read-only as same as `__ro_after_init` after kernel init. This patch makes >> `__ro_mostly_after_init` section read-write temporarily only during >> module_init/module_exit. >> >> Signed-off-by: Hoeun Ryu <hoeun.ryu@gmail.com> >> --- >> kernel/module.c | 10 ++++++++-- >> 1 file changed, 8 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/module.c b/kernel/module.c >> index 7eba6de..3b25e0e 100644 >> --- a/kernel/module.c >> +++ b/kernel/module.c >> @@ -987,8 +987,11 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user, >> >> mutex_unlock(&module_mutex); >> /* Final destruction now no one is using it. */ >> - if (mod->exit != NULL) >> + if (mod->exit != NULL) { >> + set_ro_mostly_after_init_rw(); >> mod->exit(); >> + set_ro_mostly_after_init_ro(); >> + } >> blocking_notifier_call_chain(&module_notify_list, >> MODULE_STATE_GOING, mod); >> klp_module_going(mod); >> @@ -3396,8 +3399,11 @@ static noinline int do_init_module(struct module *mod) >> >> do_mod_ctors(mod); >> /* Start the module */ >> - if (mod->init != NULL) >> + if (mod->init != NULL) { >> + set_ro_mostly_after_init_rw(); >> ret = do_one_initcall(mod->init); >> + set_ro_mostly_after_init_ro(); >> + } > > This looks very much like the pax_{open,close}_kernel() approach for > write-rarely data. I read the discussion [1] and I agree that __ro_mostly_after_init marker looks very similar to __write_rarely. > > I think it would be better to implement a first class write-rarely > mechanism rather than trying to extend __ro_after_init to cover this > case. I’m not extending __ro_after_init. __ro_mostly_after_init resides in the same section of rodata though. > > As mentioned previously, I *think* we can have a generic implementation > that uses an mm to temporarily map a (thread/cpu-local) RW alias of the > data in question in what would otherwise be the user half of the address > space. Regardless, we can have a generic interface [1] that can cater > for that style of approach and/or something like ARM's domains or x86's > pkeys. > I’m still learning cpu/kernel architectures, It would be very thankful if you tell me more about the detail of the implementation itself. The mm that maps temporary RW alias is like * special mm like idmap/init_mm which have its own page tables? * the page tables have the same content of page tables of init_mm’s swapper_pg_dir except for RW permissions for a specific section (let’s say __write_rarely) * then use switch_mm(special_rw_mm) to change the address space before the access happens to the section * then use switch_mm(current->mm) to change the address space to original after the access is done And the interface itself. rare_write(__val, __val), is it a single value access interface. I’m intending to make data in __ro_mostly_after_init section RW during multiple accesses like during module_init/exit. and __rare_rw_map()/unmap() used in rare_write() seems to work like open/close api. How could __rare_rw_ptr() be implemented and what happens when `__rw_var = __rare_rw_ptr(&(__var))` is done ? However the interface will look like, Do we still need a special data section that is mapped RO in general but RW in some cases ? if then, doesn’t __ro_mostly_after_init marker itself make sense and we still need it ? > Thanks, > Mark. > > [1] http://www.openwall.com/lists/kernel-hardening/2016/11/18/3
On Tue, Feb 21, 2017 at 10:36:05PM +0900, Ho-Eun Ryu wrote: > > On 20 Feb 2017, at 7:30 PM, Mark Rutland <mark.rutland@arm.com> wrote: > > On Sun, Feb 19, 2017 at 07:04:06PM +0900, Hoeun Ryu wrote: > >> @@ -3396,8 +3399,11 @@ static noinline int do_init_module(struct module *mod) > >> > >> do_mod_ctors(mod); > >> /* Start the module */ > >> - if (mod->init != NULL) > >> + if (mod->init != NULL) { > >> + set_ro_mostly_after_init_rw(); > >> ret = do_one_initcall(mod->init); > >> + set_ro_mostly_after_init_ro(); > >> + } > > > > This looks very much like the pax_{open,close}_kernel() approach for > > write-rarely data. > > I read the discussion [1] and I agree that __ro_mostly_after_init marker > looks very similar to __write_rarely. > > > I think it would be better to implement a first class write-rarely > > mechanism rather than trying to extend __ro_after_init to cover this > > case. > > I’m not extending __ro_after_init. __ro_mostly_after_init resides in > the same section of rodata though. Sorry; I was confused when I wrote that email. I now understand that you're adding a separate annotation. > > As mentioned previously, I *think* we can have a generic implementation > > that uses an mm to temporarily map a (thread/cpu-local) RW alias of the > > data in question in what would otherwise be the user half of the address > > space. Regardless, we can have a generic interface [1] that can cater > > for that style of approach and/or something like ARM's domains or x86's > > pkeys. > > > > I’m still learning cpu/kernel architectures, It would be very thankful if you tell me more about the detail of the implementation itself. > > The mm that maps temporary RW alias is like > * special mm like idmap/init_mm which have its own page tables? > * the page tables have the same content of page tables of > init_mm’s swapper_pg_dir except for RW permissions for a > specific section (let’s say __write_rarely) This would be a special mm, like a user mm, that only mapped the relevant VA(s). That might map the relevant variable on-demand, or the mapping could cover the whole write_rarely area. > * then use switch_mm(special_rw_mm) to change the address space > before the access happens to the section > * then use switch_mm(current->mm) to change the address space to > original after the access is done Yes. > And the interface itself. rare_write(__val, __val), is it a single > value access interface. > I’m intending to make data in __ro_mostly_after_init section RW during > multiple accesses like during module_init/exit. > and __rare_rw_map()/unmap() used in rare_write() seems to work like > open/close api. The __rare_rw_{map,unmap}() functions would map in the RW alias, but do not necessarily change the RO alias to RW. This is why __rare_rw_ptr() would be necessary, and is the major difference to the open/close API. We could certainly allow several writes between a map/unmap. The key requirement is that each write is instrumented so that it goes via the RW alias. > How could __rare_rw_ptr() be implemented and what happens when > `__rw_var = __rare_rw_ptr(&(__var))` is done ? __rare_rw_ptr() would take a pointer to the usual RO alias, and derive its RW alias. What exactly this should do depends on how the RW alias is implemented. On a system using an RW mm, let's assume we place all __write_rarely variables in a region bounded by __rare_write_begin/__rare_write_end, and when the mm is installed place, we have an RW alias of this region beginning at __rw_alias_start. In this case, it'd look something like: #define __rare_rw_ptr(ptr) ({ \ unsigned long __ptr = (unsigned long)(ptr); \ __ptr -= __rare_write_start; \ __ptr += __rw_alias_start; \ (typeof(ptr))__ptr; \ }) ... does that make sense? For systems where you can freely/easily alter (local) permissions (e.g. using ARM's domains), that can be done within __rare_rw_{map,unmap}(), and __rare_rw_ptr can just return the original pointer. > However the interface will look like, Do we still need a special data > section that is mapped RO in general but RW in some cases ? With the above, I think the usual mapping can always be RO. > if then, doesn’t __ro_mostly_after_init marker itself make sense and > we still need it ? We may need a marker to bound the set of variables we wish to map in this way. Thanks, Mark.
Thank you for your detailed explanation. It helped a lot for understandings. > On Feb 21, 2017, at 10:58 PM, Mark Rutland <mark.rutland@arm.com> wrote: > > On Tue, Feb 21, 2017 at 10:36:05PM +0900, Ho-Eun Ryu wrote: >>> On 20 Feb 2017, at 7:30 PM, Mark Rutland <mark.rutland@arm.com> wrote: >>> On Sun, Feb 19, 2017 at 07:04:06PM +0900, Hoeun Ryu wrote: > >>>> @@ -3396,8 +3399,11 @@ static noinline int do_init_module(struct module *mod) >>>> >>>> do_mod_ctors(mod); >>>> /* Start the module */ >>>> - if (mod->init != NULL) >>>> + if (mod->init != NULL) { >>>> + set_ro_mostly_after_init_rw(); >>>> ret = do_one_initcall(mod->init); >>>> + set_ro_mostly_after_init_ro(); >>>> + } >>> >>> This looks very much like the pax_{open,close}_kernel() approach for >>> write-rarely data. >> >> I read the discussion [1] and I agree that __ro_mostly_after_init marker >> looks very similar to __write_rarely. >> >>> I think it would be better to implement a first class write-rarely >>> mechanism rather than trying to extend __ro_after_init to cover this >>> case. >> >> I’m not extending __ro_after_init. __ro_mostly_after_init resides in >> the same section of rodata though. > > Sorry; I was confused when I wrote that email. I now understand that > you're adding a separate annotation. > >>> As mentioned previously, I *think* we can have a generic implementation >>> that uses an mm to temporarily map a (thread/cpu-local) RW alias of the >>> data in question in what would otherwise be the user half of the address >>> space. Regardless, we can have a generic interface [1] that can cater >>> for that style of approach and/or something like ARM's domains or x86's >>> pkeys. >> >> I’m still learning cpu/kernel architectures, It would be very thankful if you tell me more about the detail of the implementation itself. >> >> The mm that maps temporary RW alias is like >> * special mm like idmap/init_mm which have its own page tables? >> * the page tables have the same content of page tables of >> init_mm’s swapper_pg_dir except for RW permissions for a >> specific section (let’s say __write_rarely) > > This would be a special mm, like a user mm, that only mapped the > relevant VA(s). we need a separate mm/pgd for ttbr0_el1 in kernel image section as idmap and swapper_pg_dir currently do and we make VA alias mapping for RO section with RW permission under TASK_SIZE during kernel init. And then we can switch to the mm by setting the pgd to ttbr0_el1. Right ? It came to my mind that how about the relationship with SW_TTBR0_PAN . What if copy_from_user tries to do something against RW alias ? val_rw = __rw_ptr(&val); __rw_map(); copy_from_user(&val_rw, user_ptr); __re_unmap(); __rw_map() will install rw_mm->gpd to ttbr0_el1 but uaccess_enable() will immediately reinstall thread_info->pgd to ttbr0_el1 and we loose RW alias. Am I something wrong or confused ? > > That might map the relevant variable on-demand, or the mapping could > cover the whole write_rarely area. > >> * then use switch_mm(special_rw_mm) to change the address space >> before the access happens to the section >> * then use switch_mm(current->mm) to change the address space to >> original after the access is done > > Yes. > >> And the interface itself. rare_write(__val, __val), is it a single >> value access interface. >> I’m intending to make data in __ro_mostly_after_init section RW during >> multiple accesses like during module_init/exit. >> and __rare_rw_map()/unmap() used in rare_write() seems to work like >> open/close api. > > The __rare_rw_{map,unmap}() functions would map in the RW alias, but do > not necessarily change the RO alias to RW. This is why __rare_rw_ptr() > would be necessary, and is the major difference to the open/close API. > > We could certainly allow several writes between a map/unmap. The key > requirement is that each write is instrumented so that it goes via the > RW alias. > >> How could __rare_rw_ptr() be implemented and what happens when >> `__rw_var = __rare_rw_ptr(&(__var))` is done ? > > __rare_rw_ptr() would take a pointer to the usual RO alias, and derive > its RW alias. What exactly this should do depends on how the RW alias is > implemented. > > On a system using an RW mm, let's assume we place all __write_rarely > variables in a region bounded by __rare_write_begin/__rare_write_end, > and when the mm is installed place, we have an RW alias of this region > beginning at __rw_alias_start. In this case, it'd look something like: > > #define __rare_rw_ptr(ptr) ({ \ > unsigned long __ptr = (unsigned long)(ptr); \ > __ptr -= __rare_write_start; \ > __ptr += __rw_alias_start; \ > (typeof(ptr))__ptr; \ > }) > > ... does that make sense? Yes. Cool. > > For systems where you can freely/easily alter (local) permissions (e.g. > using ARM's domains), that can be done within __rare_rw_{map,unmap}(), > and __rare_rw_ptr can just return the original pointer. > >> However the interface will look like, Do we still need a special data >> section that is mapped RO in general but RW in some cases ? > > With the above, I think the usual mapping can always be RO. > >> if then, doesn’t __ro_mostly_after_init marker itself make sense and >> we still need it ? > > We may need a marker to bound the set of variables we wish to map in > this way. > > Thanks, > Mark.
diff --git a/kernel/module.c b/kernel/module.c index 7eba6de..3b25e0e 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -987,8 +987,11 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user, mutex_unlock(&module_mutex); /* Final destruction now no one is using it. */ - if (mod->exit != NULL) + if (mod->exit != NULL) { + set_ro_mostly_after_init_rw(); mod->exit(); + set_ro_mostly_after_init_ro(); + } blocking_notifier_call_chain(&module_notify_list, MODULE_STATE_GOING, mod); klp_module_going(mod); @@ -3396,8 +3399,11 @@ static noinline int do_init_module(struct module *mod) do_mod_ctors(mod); /* Start the module */ - if (mod->init != NULL) + if (mod->init != NULL) { + set_ro_mostly_after_init_rw(); ret = do_one_initcall(mod->init); + set_ro_mostly_after_init_ro(); + } if (ret < 0) { goto fail_free_freeinit; }
`__ro_mostly_after_init` is almost like `__ro_after_init`. The section is read-only as same as `__ro_after_init` after kernel init. This patch makes `__ro_mostly_after_init` section read-write temporarily only during module_init/module_exit. Signed-off-by: Hoeun Ryu <hoeun.ryu@gmail.com> --- kernel/module.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)