Message ID | 20210319144058.772525-1-dja@axtens.net (mailing list archive) |
---|---|
Headers | show |
Series | KASAN for powerpc64 radix | expand |
On Sat, Mar 20, 2021 at 01:40:52AM +1100, Daniel Axtens wrote: > Building on the work of Christophe, Aneesh and Balbir, I've ported > KASAN to 64-bit Book3S kernels running on the Radix MMU. > > v11 applies to next-20210317. I had hoped to have it apply to > powerpc/next but once again there are changes in the kasan core that > clash. Also, thanks to mpe for fixing a build break with KASAN off. > > I'm not sure how best to progress this towards actually being merged > when it has impacts across subsystems. I'd appreciate any input. Maybe > the first four patches could go in via the kasan tree, that should > make things easier for powerpc in a future cycle? > > v10 rebases on top of next-20210125, fixing things up to work on top > of the latest changes, and fixing some review comments from > Christophe. I have tested host and guest with 64k pages for this spin. > > There is now only 1 failing KUnit test: kasan_global_oob - gcc puts > the ASAN init code in a section called '.init_array'. Powerpc64 module > loading code goes through and _renames_ any section beginning with > '.init' to begin with '_init' in order to avoid some complexities > around our 24-bit indirect jumps. This means it renames '.init_array' > to '_init_array', and the generic module loading code then fails to > recognise the section as a constructor and thus doesn't run it. This > hack dates back to 2003 and so I'm not going to try to unpick it in > this series. (I suspect this may have previously worked if the code > ended up in .ctors rather than .init_array but I don't keep my old > binaries around so I have no real way of checking.) > > (The previously failing stack tests are now skipped due to more > accurate configuration settings.) > > Details from v9: This is a significant reworking of the previous > versions. Instead of the previous approach which supported inline > instrumentation, this series provides only outline instrumentation. > > To get around the problem of accessing the shadow region inside code we run > with translations off (in 'real mode'), we we restrict checking to when > translations are enabled. This is done via a new hook in the kasan core and > by excluding larger quantites of arch code from instrumentation. The upside > is that we no longer require that you be able to specify the amount of > physically contiguous memory on the system at compile time. Hopefully this > is a better trade-off. More details in patch 6. > > kexec works. Both 64k and 4k pages work. Running as a KVM host works, but > nothing in arch/powerpc/kvm is instrumented. It's also potentially a bit > fragile - if any real mode code paths call out to instrumented code, things > will go boom. > The last time I checked, the changes for real mode, made the code hard to review/maintain. I am happy to see that we've decided to leave that off the table for now, reviewing the series Balbir Singh.
Le 19/03/2021 à 15:40, Daniel Axtens a écrit : > Building on the work of Christophe, Aneesh and Balbir, I've ported > KASAN to 64-bit Book3S kernels running on the Radix MMU. > > v11 applies to next-20210317. I had hoped to have it apply to > powerpc/next but once again there are changes in the kasan core that > clash. Also, thanks to mpe for fixing a build break with KASAN off. > > I'm not sure how best to progress this towards actually being merged > when it has impacts across subsystems. I'd appreciate any input. Maybe > the first four patches could go in via the kasan tree, that should > make things easier for powerpc in a future cycle? > > v10 rebases on top of next-20210125, fixing things up to work on top > of the latest changes, and fixing some review comments from > Christophe. I have tested host and guest with 64k pages for this spin. > > There is now only 1 failing KUnit test: kasan_global_oob - gcc puts > the ASAN init code in a section called '.init_array'. Powerpc64 module > loading code goes through and _renames_ any section beginning with > '.init' to begin with '_init' in order to avoid some complexities > around our 24-bit indirect jumps. This means it renames '.init_array' > to '_init_array', and the generic module loading code then fails to > recognise the section as a constructor and thus doesn't run it. This > hack dates back to 2003 and so I'm not going to try to unpick it in > this series. (I suspect this may have previously worked if the code > ended up in .ctors rather than .init_array but I don't keep my old > binaries around so I have no real way of checking.) > > (The previously failing stack tests are now skipped due to more > accurate configuration settings.) > > Details from v9: This is a significant reworking of the previous > versions. Instead of the previous approach which supported inline > instrumentation, this series provides only outline instrumentation. > > To get around the problem of accessing the shadow region inside code we run > with translations off (in 'real mode'), we we restrict checking to when > translations are enabled. This is done via a new hook in the kasan core and > by excluding larger quantites of arch code from instrumentation. The upside > is that we no longer require that you be able to specify the amount of > physically contiguous memory on the system at compile time. Hopefully this > is a better trade-off. More details in patch 6. > > kexec works. Both 64k and 4k pages work. Running as a KVM host works, but > nothing in arch/powerpc/kvm is instrumented. It's also potentially a bit > fragile - if any real mode code paths call out to instrumented code, things > will go boom. > In the discussion we had long time ago, https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190806233827.16454-5-dja@axtens.net/#2321067 , I challenged you on why it was not possible to implement things the same way as other architectures, in extenso with an early mapping. Your first answer was that too many things were done in real mode at startup. After some discussion you said that finally there was not that much things at startup but the issue was KVM. Now you say that instrumentation on KVM is fully disabled. So my question is, if KVM is not a problem anymore, why not go the standard way with an early shadow ? Then you could also support inline instrumentation. Christophe
Hi Christophe, > In the discussion we had long time ago, > https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190806233827.16454-5-dja@axtens.net/#2321067 > , I challenged you on why it was not possible to implement things the same way as other > architectures, in extenso with an early mapping. > > Your first answer was that too many things were done in real mode at startup. After some discussion > you said that finally there was not that much things at startup but the issue was KVM. > > Now you say that instrumentation on KVM is fully disabled. > > So my question is, if KVM is not a problem anymore, why not go the standard way with an early shadow > ? Then you could also support inline instrumentation. Fair enough, I've had some trouble both understanding the problem myself and clearly articulating it. Let me try again. We need translations on to access the shadow area. We reach setup_64.c::early_setup() with translations off. At this point we don't know what MMU we're running under, or our CPU features. To determine our MMU and CPU features, early_setup() calls functions (dt_cpu_ftrs_init, early_init_devtree) that call out to generic code like of_scan_flat_dt. We need to do this before we turn on translations because we can't set up the MMU until we know what MMU we have. So this puts us in a bind: - We can't set up an early shadow until we have translations on, which requires that the MMU is set up. - We can't set up an MMU until we call out to generic code for FDT parsing. So there will be calls to generic FDT parsing code that happen before the early shadow is set up. The setup code also prints a bunch of information about the platform with printk() while translations are off, so it wouldn't even be enough to disable instrumentation for bits of the generic DT code on ppc64. Does that make sense? If you can figure out how to 'square the circle' here I'm all ears. Other notes: - There's a comment about printk() being 'safe' in early_setup(), that refers to having a valid PACA, it doesn't mean that it's safe in any other sense. - KVM does indeed also run stuff with translations off but we can catch all of that by disabling instrumentation on the real-mode handlers: it doesn't seem to leak out to generic code. So you are right that KVM is no longer an issue. Kind regards, Daniel > > Christophe
Le 23/03/2021 à 02:21, Daniel Axtens a écrit : > Hi Christophe, > >> In the discussion we had long time ago, >> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190806233827.16454-5-dja@axtens.net/#2321067 >> , I challenged you on why it was not possible to implement things the same way as other >> architectures, in extenso with an early mapping. >> >> Your first answer was that too many things were done in real mode at startup. After some discussion >> you said that finally there was not that much things at startup but the issue was KVM. >> >> Now you say that instrumentation on KVM is fully disabled. >> >> So my question is, if KVM is not a problem anymore, why not go the standard way with an early shadow >> ? Then you could also support inline instrumentation. > > Fair enough, I've had some trouble both understanding the problem myself > and clearly articulating it. Let me try again. > > We need translations on to access the shadow area. > > We reach setup_64.c::early_setup() with translations off. At this point > we don't know what MMU we're running under, or our CPU features. What do you need to know ? Whether it is Hash or Radix, or more/different details ? IIUC, today we only support KASAN on Radix. Would it make sense to say that a kernel built with KASAN can only run on processors having Radix capacility ? Then select CONFIG_PPC_RADIX_MMU_DEFAULT when KASAN is set, and accept that the kernel crashes if Radix is not available ? > > To determine our MMU and CPU features, early_setup() calls functions > (dt_cpu_ftrs_init, early_init_devtree) that call out to generic code > like of_scan_flat_dt. We need to do this before we turn on translations > because we can't set up the MMU until we know what MMU we have. > > So this puts us in a bind: > > - We can't set up an early shadow until we have translations on, which > requires that the MMU is set up. > > - We can't set up an MMU until we call out to generic code for FDT > parsing. > > So there will be calls to generic FDT parsing code that happen before the > early shadow is set up. I see some logic in kernel/prom_init.c for detecting MMU. Can we get the information from there in order to setup the MMU ? > > The setup code also prints a bunch of information about the platform > with printk() while translations are off, so it wouldn't even be enough > to disable instrumentation for bits of the generic DT code on ppc64. I'm sure the printk() stuff can be avoided or delayed without much problems, I guess the main problem is the DT code, isn't it ? As far as I can see the code only use udbg_printf() before MMU is on, and this could be simply skipped when KASAN is selected, I see no situation where you need early printk together with KASAN. > > Does that make sense? If you can figure out how to 'square the circle' > here I'm all ears. Yes it is a lot more clear now, thanks you. Gave a few ideas above, does it help ? > > Other notes: > > - There's a comment about printk() being 'safe' in early_setup(), that > refers to having a valid PACA, it doesn't mean that it's safe in any > other sense. > > - KVM does indeed also run stuff with translations off but we can catch > all of that by disabling instrumentation on the real-mode handlers: > it doesn't seem to leak out to generic code. So you are right that > KVM is no longer an issue. > Christophe
Christophe Leroy <christophe.leroy@csgroup.eu> writes: > Le 23/03/2021 à 02:21, Daniel Axtens a écrit : >> Hi Christophe, >> >>> In the discussion we had long time ago, >>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190806233827.16454-5-dja@axtens.net/#2321067 >>> , I challenged you on why it was not possible to implement things the same way as other >>> architectures, in extenso with an early mapping. >>> >>> Your first answer was that too many things were done in real mode at startup. After some discussion >>> you said that finally there was not that much things at startup but the issue was KVM. >>> >>> Now you say that instrumentation on KVM is fully disabled. >>> >>> So my question is, if KVM is not a problem anymore, why not go the standard way with an early shadow >>> ? Then you could also support inline instrumentation. >> >> Fair enough, I've had some trouble both understanding the problem myself >> and clearly articulating it. Let me try again. >> >> We need translations on to access the shadow area. >> >> We reach setup_64.c::early_setup() with translations off. At this point >> we don't know what MMU we're running under, or our CPU features. > > What do you need to know ? Whether it is Hash or Radix, or > more/different details ? Yes, as well as some other details like SLB size, supported segment & page sizes, possibly the CPU version for workarounds, various other device tree things. You also need to know if you're bare metal or in a guest, or on a PS3 ... > IIUC, today we only support KASAN on Radix. Would it make sense to say that a kernel built with > KASAN can only run on processors having Radix capacility ? Then select CONFIG_PPC_RADIX_MMU_DEFAULT > when KASAN is set, and accept that the kernel crashes if Radix is not available ? I would rather not. We already have some options like that (EARLY_DEBUG), and they have caused people to waste time debugging crashes over the years that turned out to just due to the wrong CONFIG selected. >> To determine our MMU and CPU features, early_setup() calls functions >> (dt_cpu_ftrs_init, early_init_devtree) that call out to generic code >> like of_scan_flat_dt. We need to do this before we turn on translations >> because we can't set up the MMU until we know what MMU we have. >> >> So this puts us in a bind: >> >> - We can't set up an early shadow until we have translations on, which >> requires that the MMU is set up. >> >> - We can't set up an MMU until we call out to generic code for FDT >> parsing. >> >> So there will be calls to generic FDT parsing code that happen before the >> early shadow is set up. > > I see some logic in kernel/prom_init.c for detecting MMU. Can we get the information from there in > order to setup the MMU ? You could find some of the information, but you'd need to stash it somewhere (like the flat device tree :P) because you can't turn the MMU on until we shutdown open firmware. That also doesn't help you on bare metal where we don't use prom_init. >> The setup code also prints a bunch of information about the platform >> with printk() while translations are off, so it wouldn't even be enough >> to disable instrumentation for bits of the generic DT code on ppc64. > > I'm sure the printk() stuff can be avoided or delayed without much problems, I guess the main > problem is the DT code, isn't it ? We spent many years making printk() work for early boot messages, because it has the nice property of being persisted in dmesg. But possibly we could come up with some workaround for that. Disabling KASAN for the flat DT code seems like it wouldn't be a huge loss, most (all?) of that code should only run at boot anyway. But we also have code spread out in various files that would need to be built without KASAN. See eg. everything called by of_scan_flat_dt(), mmu_early_init_devtree(), pseries_probe_fw_features() pkey_early_init_devtree() etc. Because we can only disable KASAN per-file that would require quite a bit of code movement and related churn. > As far as I can see the code only use udbg_printf() before MMU is on, and this could be simply > skipped when KASAN is selected, I see no situation where you need early printk together with KASAN. We definitely use printk() before the MMU is on. >> Does that make sense? If you can figure out how to 'square the circle' >> here I'm all ears. > > Yes it is a lot more clear now, thanks you. Gave a few ideas above, > does it help ? A little? :) It's possible we could do slightly less of the current boot sequence before turning the MMU on. But we would still need to scan the flat device tree, so all that code would be implicated either way. We could also rearrange the early boot code to put bits in separate files so they can be built without KASAN, but like I said above that would be a lot of churn. I don't see a way to fix printk() though, other than not using it during early boot. Maybe that's OK but it feels like a bit of a backward step. There's also other issues, like if we WARN during early boot that causes a program check and that runs all sorts of code, some of which would have KASAN enabled. So I don't see an easy path to enabling inline instrumentation. It's obviously possible, but I don't think it's something we can get done in any reasonable time frame. cheers