Message ID | cover.1549927666.git.igor.stoppa@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | hardening: statically allocated protected memory | expand |
On Mon, Feb 11, 2019 at 3:28 PM Igor Stoppa <igor.stoppa@gmail.com> wrote: > at last I'm able to resume work on the memory protection patchset I've > proposed some time ago. This version should address comments received so > far and introduce support for arm64. Details below. Cool! > Patch-set implementing write-rare memory protection for statically > allocated data. It seems like this could be expanded in the future to cover dynamic memory too (i.e. just a separate base range in the mm). > Its purpose is to keep write protected the kernel data which is seldom > modified, especially if altering it can be exploited during an attack. > > There is no read overhead, however writing requires special operations that > are probably unsuitable for often-changing data. > The use is opt-in, by applying the modifier __wr_after_init to a variable > declaration. > > As the name implies, the write protection kicks in only after init() is > completed; before that moment, the data is modifiable in the usual way. > > Current Limitations: > * supports only data which is allocated statically, at build time. > * supports only x86_64 and arm64;other architectures need to provide own > backend It looked like only the memset() needed architecture support. Is there a reason for not being able to implement memset() in terms of an inefficient put_user() loop instead? That would eliminate the need for per-arch support, yes? > - I've added a simple example: the protection of ima_policy_flags You'd also looked at SELinux too, yes? What other things could be targeted for protection? (It seems we can't yet protect page tables themselves with this...) > - the x86_64 user space address range is double the size of the kernel > address space, so it's possible to randomize the beginning of the > mapping of the kernel address space, but on arm64 they have the same > size, so it's not possible to do the same Only the wr_rare section needs mapping, though, yes? > - I'm not sure if it's correct, since it doesn't seem to be that common in > kernel sources, but instead of using #defines for overriding default > function calls, I'm using "weak" for the default functions. The tradition is to use #defines for easier readability, but "weak" continues to be a thing. *shrug* This will be a nice addition to protect more of the kernel's static data from write-what-where attacks. :)
On 12/02/2019 02:09, Kees Cook wrote: > On Mon, Feb 11, 2019 at 3:28 PM Igor Stoppa <igor.stoppa@gmail.com> wrote: [...] >> Patch-set implementing write-rare memory protection for statically >> allocated data. > > It seems like this could be expanded in the future to cover dynamic > memory too (i.e. just a separate base range in the mm). Indeed. And part of the code refactoring is also geared in that direction. I am working on that part, but it was agreed that I would first provide this subset of features covering statically allocated memory. So I'm sticking to the plan. But this is roughly 1/3 of the basic infra I have in mind. >> Its purpose is to keep write protected the kernel data which is seldom >> modified, especially if altering it can be exploited during an attack. >> >> There is no read overhead, however writing requires special operations that >> are probably unsuitable for often-changing data. >> The use is opt-in, by applying the modifier __wr_after_init to a variable >> declaration. >> >> As the name implies, the write protection kicks in only after init() is >> completed; before that moment, the data is modifiable in the usual way. >> >> Current Limitations: >> * supports only data which is allocated statically, at build time. >> * supports only x86_64 and arm64;other architectures need to provide own >> backend > > It looked like only the memset() needed architecture support. Is there > a reason for not being able to implement memset() in terms of an > inefficient put_user() loop instead? That would eliminate the need for > per-arch support, yes? So far, yes, however from previous discussion about power arch, I understood this implementation would not be so easy to adapt. Lacking other examples where the extra mapping could be used, I did not want to add code without a use case. Probably both arm and x86 32 bit could do, but I would like to first get to the bitter end with memory protection (the other 2 thirds). Mostly, I hated having just one arch and I also really wanted to have arm64. But eventually, yes, a generic put_user() loop could do, provided that there are other arch where the extra mapping to user space would be a good way to limit write access. This last part is what I'm not sure of. >> - I've added a simple example: the protection of ima_policy_flags > > You'd also looked at SELinux too, yes? What other things could be > targeted for protection? (It seems we can't yet protect page tables > themselves with this...) Yes, I have. See the "1/3" explanation above. I'm also trying to get away with as small example as possible, to get the basic infra merged. SELinux is not going to be a small patch set. I'd rather move to it once at least some of the framework is merged. It might be a good use case for dynamic allocation, if I do not find something smaller. But for static write rare, going after IMA was easier, and it is still a good target for protection, imho, as flipping this variable should be sufficient for turning IMA off. For the page tables, I have in mind a little bit different approach, that I hope to explain better once I get to do the dynamic allocation. >> - the x86_64 user space address range is double the size of the kernel >> address space, so it's possible to randomize the beginning of the >> mapping of the kernel address space, but on arm64 they have the same >> size, so it's not possible to do the same > > Only the wr_rare section needs mapping, though, yes? Yup, however, once more, I'm not so keen to do what seems as premature optimization, before I have addressed the framework in its entirety, as the dynamic allocation will need similar treatment. >> - I'm not sure if it's correct, since it doesn't seem to be that common in >> kernel sources, but instead of using #defines for overriding default >> function calls, I'm using "weak" for the default functions. > > The tradition is to use #defines for easier readability, but "weak" > continues to be a thing. *shrug* Yes, I wasn't so sure about it, but I kinda liked the fact that, by using "weak", the arch header becomes optional, unless one has to redefine the struct wr_state. > This will be a nice addition to protect more of the kernel's static > data from write-what-where attacks. :) I hope so :-) -- thanks, igor
On Mon, Feb 11, 2019 at 4:37 PM Igor Stoppa <igor.stoppa@gmail.com> wrote: > > > > On 12/02/2019 02:09, Kees Cook wrote: > > On Mon, Feb 11, 2019 at 3:28 PM Igor Stoppa <igor.stoppa@gmail.com> wrote: > > It looked like only the memset() needed architecture support. Is there > > a reason for not being able to implement memset() in terms of an > > inefficient put_user() loop instead? That would eliminate the need for > > per-arch support, yes? > > So far, yes, however from previous discussion about power arch, I > understood this implementation would not be so easy to adapt. > Lacking other examples where the extra mapping could be used, I did not > want to add code without a use case. > > Probably both arm and x86 32 bit could do, but I would like to first get > to the bitter end with memory protection (the other 2 thirds). > > Mostly, I hated having just one arch and I also really wanted to have arm64. Right, I meant, if you implemented the _memset() case with put_user() in this version, you could drop the arch-specific _memset() and shrink the patch series. Then you could also enable this across all the architectures in one patch. (Would you even need the Kconfig patches, i.e. won't this "Just Work" on everything with an MMU?)
On Tue, 12 Feb 2019, 4.47 Kees Cook <keescook@chromium.org wrote: > On Mon, Feb 11, 2019 at 4:37 PM Igor Stoppa <igor.stoppa@gmail.com> wrote: > > > > > > > > On 12/02/2019 02:09, Kees Cook wrote: > > > On Mon, Feb 11, 2019 at 3:28 PM Igor Stoppa <igor.stoppa@gmail.com> > wrote: > > > It looked like only the memset() needed architecture support. Is there > > > a reason for not being able to implement memset() in terms of an > > > inefficient put_user() loop instead? That would eliminate the need for > > > per-arch support, yes? > > > > So far, yes, however from previous discussion about power arch, I > > understood this implementation would not be so easy to adapt. > > Lacking other examples where the extra mapping could be used, I did not > > want to add code without a use case. > > > > Probably both arm and x86 32 bit could do, but I would like to first get > > to the bitter end with memory protection (the other 2 thirds). > > > > Mostly, I hated having just one arch and I also really wanted to have > arm64. > > Right, I meant, if you implemented the _memset() case with put_user() > in this version, you could drop the arch-specific _memset() and shrink > the patch series. Then you could also enable this across all the > architectures in one patch. (Would you even need the Kconfig patches, > i.e. won't this "Just Work" on everything with an MMU?) > I had similar thoughts, but this answer [1] deflated my hopes (if I understood it correctly). It seems that each arch needs to be massaged in separately. -- igor [1] https://www.openwall.com/lists/kernel-hardening/2018/12/12/15 > <div dir="auto"><div><br><br><div class="gmail_quote"><div dir="ltr">On Tue, 12 Feb 2019, 4.47 Kees Cook <<a href="mailto:keescook@chromium.org">keescook@chromium.org</a> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Mon, Feb 11, 2019 at 4:37 PM Igor Stoppa <<a href="mailto:igor.stoppa@gmail.com" target="_blank" rel="noreferrer">igor.stoppa@gmail.com</a>> wrote:<br> ><br> ><br> ><br> > On 12/02/2019 02:09, Kees Cook wrote:<br> > > On Mon, Feb 11, 2019 at 3:28 PM Igor Stoppa <<a href="mailto:igor.stoppa@gmail.com" target="_blank" rel="noreferrer">igor.stoppa@gmail.com</a>> wrote:<br> > > It looked like only the memset() needed architecture support. Is there<br> > > a reason for not being able to implement memset() in terms of an<br> > > inefficient put_user() loop instead? That would eliminate the need for<br> > > per-arch support, yes?<br> ><br> > So far, yes, however from previous discussion about power arch, I<br> > understood this implementation would not be so easy to adapt.<br> > Lacking other examples where the extra mapping could be used, I did not<br> > want to add code without a use case.<br> ><br> > Probably both arm and x86 32 bit could do, but I would like to first get<br> > to the bitter end with memory protection (the other 2 thirds).<br> ><br> > Mostly, I hated having just one arch and I also really wanted to have arm64.<br> <br> Right, I meant, if you implemented the _memset() case with put_user()<br> in this version, you could drop the arch-specific _memset() and shrink<br> the patch series. Then you could also enable this across all the<br> architectures in one patch. (Would you even need the Kconfig patches,<br> i.e. won't this "Just Work" on everything with an MMU?)<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">I had similar thoughts, but this answer [1] deflated my hopes (if I understood it correctly).</div><div dir="auto">It seems that each arch needs to be massaged in separately. </div><div dir="auto"><br></div><div dir="auto">--</div><div dir="auto">igor</div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto">[1] <a href="https://www.openwall.com/lists/kernel-hardening/2018/12/12/15">https://www.openwall.com/lists/kernel-hardening/2018/12/12/15</a></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> </blockquote></div></div></div>
On Mon, Feb 11, 2019 at 5:08 PM igor.stoppa@gmail.com <igor.stoppa@gmail.com> wrote: > > > > On Tue, 12 Feb 2019, 4.47 Kees Cook <keescook@chromium.org wrote: >> >> On Mon, Feb 11, 2019 at 4:37 PM Igor Stoppa <igor.stoppa@gmail.com> wrote: >> > >> > >> > >> > On 12/02/2019 02:09, Kees Cook wrote: >> > > On Mon, Feb 11, 2019 at 3:28 PM Igor Stoppa <igor.stoppa@gmail.com> wrote: >> > > It looked like only the memset() needed architecture support. Is there >> > > a reason for not being able to implement memset() in terms of an >> > > inefficient put_user() loop instead? That would eliminate the need for >> > > per-arch support, yes? >> > >> > So far, yes, however from previous discussion about power arch, I >> > understood this implementation would not be so easy to adapt. >> > Lacking other examples where the extra mapping could be used, I did not >> > want to add code without a use case. >> > >> > Probably both arm and x86 32 bit could do, but I would like to first get >> > to the bitter end with memory protection (the other 2 thirds). >> > >> > Mostly, I hated having just one arch and I also really wanted to have arm64. >> >> Right, I meant, if you implemented the _memset() case with put_user() >> in this version, you could drop the arch-specific _memset() and shrink >> the patch series. Then you could also enable this across all the >> architectures in one patch. (Would you even need the Kconfig patches, >> i.e. won't this "Just Work" on everything with an MMU?) > > > I had similar thoughts, but this answer [1] deflated my hopes (if I understood it correctly). > It seems that each arch needs to be massaged in separately. True, but I think x86_64, x86, arm64, and arm will all be "normal". power may be that way too, but they always surprise me. :) Anyway, series looks good, but since nothing uses _memset(), it might make sense to leave it out and put all the arch-enabling into a single patch to cover the 4 archs above, in an effort to make the series even smaller.
On 12/02/2019 03:26, Kees Cook wrote: > On Mon, Feb 11, 2019 at 5:08 PM igor.stoppa@gmail.com > <igor.stoppa@gmail.com> wrote: >> >> >> >> On Tue, 12 Feb 2019, 4.47 Kees Cook <keescook@chromium.org wrote: >>> >>> On Mon, Feb 11, 2019 at 4:37 PM Igor Stoppa <igor.stoppa@gmail.com> wrote: >>>> >>>> >>>> >>>> On 12/02/2019 02:09, Kees Cook wrote: >>>>> On Mon, Feb 11, 2019 at 3:28 PM Igor Stoppa <igor.stoppa@gmail.com> wrote: >>>>> It looked like only the memset() needed architecture support. Is there >>>>> a reason for not being able to implement memset() in terms of an >>>>> inefficient put_user() loop instead? That would eliminate the need for >>>>> per-arch support, yes? >>>> >>>> So far, yes, however from previous discussion about power arch, I >>>> understood this implementation would not be so easy to adapt. >>>> Lacking other examples where the extra mapping could be used, I did not >>>> want to add code without a use case. >>>> >>>> Probably both arm and x86 32 bit could do, but I would like to first get >>>> to the bitter end with memory protection (the other 2 thirds). >>>> >>>> Mostly, I hated having just one arch and I also really wanted to have arm64. >>> >>> Right, I meant, if you implemented the _memset() case with put_user() >>> in this version, you could drop the arch-specific _memset() and shrink >>> the patch series. Then you could also enable this across all the >>> architectures in one patch. (Would you even need the Kconfig patches, >>> i.e. won't this "Just Work" on everything with an MMU?) >> >> >> I had similar thoughts, but this answer [1] deflated my hopes (if I understood it correctly). >> It seems that each arch needs to be massaged in separately. > > True, but I think x86_64, x86, arm64, and arm will all be "normal". > power may be that way too, but they always surprise me. :) > > Anyway, series looks good, but since nothing uses _memset(), it might > make sense to leave it out and put all the arch-enabling into a single > patch to cover the 4 archs above, in an effort to make the series even > smaller. Actually, I do use it, albeit indirectly. That's the whole point of having the IMA patch as example. This is the fragment: ---------------- @@ -460,12 +460,13 @@ void ima_update_policy_flag(void) list_for_each_entry(entry, ima_rules, list) { if (entry->action & IMA_DO_MASK) - ima_policy_flag |= entry->action; + wr_assign(ima_policy_flag, + ima_policy_flag | entry->action); } ima_appraise |= (build_ima_appraise | temp_ima_appraise); if (!ima_appraise) - ima_policy_flag &= ~IMA_APPRAISE; + wr_assign(ima_policy_flag, ima_policy_flag & ~IMA_APPRAISE); } ---------------- wr_assign() does just that. However, reading again your previous mails, I realize that I might have misinterpreted what you were suggesting. If the advice is to have also a default memset_user() which relies on put_user(), but do not activate the feature by default for every architecture, I definitely agree that it would be good to have it. I just didn't think about it before. What I cannot do is to turn it on for all the architectures prior to test it and atm I do not have means to do it. But I now realize that most likely you were just suggesting to have full, albeit inefficient default support and then let various archs review/enhance it. I can certainly do this. Regarding testing I have a question: how much can/should I lean on qemu? In most cases the MMU might not need to be fully emulated, so I wonder how well qemu-based testing can ensure that real life scenarios will work. -- igor
On Mon, Feb 11, 2019 at 11:09 PM Igor Stoppa <igor.stoppa@gmail.com> wrote: > wr_assign() does just that. > > However, reading again your previous mails, I realize that I might have > misinterpreted what you were suggesting. > > If the advice is to have also a default memset_user() which relies on > put_user(), but do not activate the feature by default for every > architecture, I definitely agree that it would be good to have it. > I just didn't think about it before. Yeah, I just mean you could have an arch-agnostic memset_user() implementation. > But I now realize that most likely you were just suggesting to have > full, albeit inefficient default support and then let various archs > review/enhance it. I can certainly do this. Right. > Regarding testing I have a question: how much can/should I lean on qemu? > In most cases the MMU might not need to be fully emulated, so I wonder > how well qemu-based testing can ensure that real life scenarios will work. I think qemu lets you know if it works (kvm is using the real MMU), and baremetal will give you more stable performance numbers.