Message ID | E1YfmY6-0006X3-1N@rmk-PC.arm.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Dear Russell King, On Wed, 08 Apr 2015 10:45:30 +0100, Russell King wrote: > Re-implement the physical address space switching to be architecturally > complaint. This involves flushing the caches, disabling the MMU, and Nit: s/complaint/compliant/. But since I'm not going to write an e-mail just for such a small nit, here is an hopefully more interesting question: do you think it would be possible to use this logic for my Armada 370/XP/375/38x case, where I need some specific page table configuration to be able to use HW I/O coherency in non-SMP configurations? If you remember the discussion we had, the primary issue is that the page tables are built in arch/arm/kernel/head.S, before we even know which SoC we are running on (and the CPU ID available from CP15 is not sufficient to determine whether the SoC supports HW I/O coherency). If we can get into the SoC-specific code, and at this point decide whether and how the page tables should be rebuilt (which is apparently what happens in Keystone), then it would be also useful for our case I believe. Thanks, Thomas
On 4/8/2015 7:34 AM, Thomas Petazzoni wrote: > Dear Russell King, > > On Wed, 08 Apr 2015 10:45:30 +0100, Russell King wrote: >> Re-implement the physical address space switching to be architecturally >> complaint. This involves flushing the caches, disabling the MMU, and > > Nit: s/complaint/compliant/. > > But since I'm not going to write an e-mail just for such a small nit, > here is an hopefully more interesting question: do you think it would > be possible to use this logic for my Armada 370/XP/375/38x case, where > I need some specific page table configuration to be able to use HW I/O > coherency in non-SMP configurations? > > If you remember the discussion we had, the primary issue is that the > page tables are built in arch/arm/kernel/head.S, before we even know > which SoC we are running on (and the CPU ID available from CP15 is not > sufficient to determine whether the SoC supports HW I/O coherency). If > we can get into the SoC-specific code, and at this point decide whether > and how the page tables should be rebuilt (which is apparently what > happens in Keystone), then it would be also useful for our case I > believe. > Well Keystone does switch the page tables to use entirely different memory but what you are talking is, you want to control the page table attributes for certain pages(memory area). Head.s page tables are just for boot up. The actual page tables creation happens in paging_init(). Not sure if you just need a special MT_* area which could take care of your case but am missing details so no sure. Regards, Santosh
Hi Russell, On Wed, Apr 08, 2015 at 10:45:30AM +0100, Russell King wrote: > Re-implement the physical address space switching to be architecturally > complaint. This involves flushing the caches, disabling the MMU, and > only then updating the page tables. Once that is complete, the system > can be brought back up again. > > Since we disable the MMU, we need to do the update in assembly code. > Luckily, the entries which need updating are fairly trivial, and are > all setup by the early assembly code. We can merely adjust each entry > by the delta required. > > Not only does htis fix the code to be architecturally compliant, but it > fixes a couple of bugs too: Nit: s/htis/this/ [...] > - /* > - * Ensure that the above updates are flushed out of the cache. > - * This is not strictly correct; on a system where the caches > - * are coherent with each other, but the MMU page table walks > - * may not be coherent, flush_cache_all() may be a no-op, and > - * this will fail. > + * We changing not only the virtual to physical mapping, but > + * also the physical addresses used to access memory. We need > + * to flush all levels of cache in the system with caching > + * disabled to ensure that all data is written back. We do this > + * with caching and write buffering disabled to ensure that > + * nothing is speculatively prefetched. > */ > + cr = get_cr(); > + set_cr(cr & ~(CR_I | CR_C | CR_W)); SCTLR[3] (CR_W) is RAO/SBOP in VMSAv7. I don't think we should clear it here for ARMv7. To the best of my knowledge, the page table walkers aren't affected by SCTLR.C, and use the attributes in TTBR{0,1} or TTBCR when translation is active (i.e. when SCTLR.M is set). So at this point they can make cacheable accesses to the page tables (and allocate into the caches) in the background... > flush_cache_all(); ...meaning this flush may not leave the caches empty, at least for memory regions containing page tables. Stale lines could mask updates made in lpae_pgtables_remap_asm. I think that the cache flush needs to be performed after both SCTLR.{C,M} are cleared in lpae_pgtables_remap_asm, just before the page table updates. So long as the relevant pieces of kernel text are initially clean to the PoC, we shouldn't need to flush anything beforehand. > > /* > - * Re-write the TTBR values to point them at the high physical > - * alias of the page tables. We expect __va() will work on > - * cpu_get_pgd(), which returns the value of TTBR0. > + * Fixup the page tables - this must be in the idmap region as > + * we need to disable the MMU to do this safely, and hence it > + * needs to be assembly. It's fairly simple, as we're using the > + * temporary tables setup by the initial assembly code. > */ > - cpu_switch_mm(pgd0, &init_mm); > - cpu_set_ttbr(1, __pa(pgd0) + TTBR1_OFFSET); > + lpae_pgtables_remap(offset, pa_pgd, boot_data); > > - /* Finally flush any stale TLB values. */ > - local_flush_bp_all(); > - local_flush_tlb_all(); > + /* Re-enable the caches */ > + set_cr(cr); That being the case there's no reason to restore M and C separately on the return path either. [...] > +ENTRY(lpae_pgtables_remap_asm) > + stmfd sp!, {r4-r8, lr} > + > + mrc p15, 0, r8, c1, c0, 0 @ read control reg > + bic ip, r8, #CR_M @ disable caches and MMU > + mcr p15, 0, ip, c1, c0, 0 > + dsb > + isb Shouldn't the DSB be between the STMFD and the MCR (given the SP doesn't point to an idmap/physical address)? I don't see why we need a DSB after the write to the SCTLR. [...] > + dsb > + isb > + > + mcr p15, 0, r8, c1, c0, 0 @ re-enable MMU > + dsb > + isb Similarly, isn't the last DSB redundant? Thanks, Mark.
On Wed, Apr 08, 2015 at 06:36:56PM +0100, Mark Rutland wrote: > Hi Russell, > > On Wed, Apr 08, 2015 at 10:45:30AM +0100, Russell King wrote: > > - /* > > - * Ensure that the above updates are flushed out of the cache. > > - * This is not strictly correct; on a system where the caches > > - * are coherent with each other, but the MMU page table walks > > - * may not be coherent, flush_cache_all() may be a no-op, and > > - * this will fail. > > + * We changing not only the virtual to physical mapping, but > > + * also the physical addresses used to access memory. We need > > + * to flush all levels of cache in the system with caching > > + * disabled to ensure that all data is written back. We do this > > + * with caching and write buffering disabled to ensure that > > + * nothing is speculatively prefetched. > > */ > > + cr = get_cr(); > > + set_cr(cr & ~(CR_I | CR_C | CR_W)); > > SCTLR[3] (CR_W) is RAO/SBOP in VMSAv7. I don't think we should clear it > here for ARMv7. I was in two minds about that - I guess as we're expecting to only run this on ARMv7 CPUs, we can omit clearing the CR_W, but I'd need to add a comment saying that it's ARMv7 only. > To the best of my knowledge, the page table walkers aren't affected by > SCTLR.C, and use the attributes in TTBR{0,1} or TTBCR when translation > is active (i.e. when SCTLR.M is set). So at this point they can make > cacheable accesses to the page tables (and allocate into the caches) in > the background... We had better clear those bits too then. > I think that the cache flush needs to be performed after both > SCTLR.{C,M} are cleared in lpae_pgtables_remap_asm, just before the page > table updates. So long as the relevant pieces of kernel text are > initially clean to the PoC, we shouldn't need to flush anything > beforehand. To that I say... no bloody way in hell, even once hell has frozen over. It took almost a _day_ to get this much working, much of it was attempting to use cache flushing functions after the MMU had been turned off. If it was possible to make it work, I'd have done so. It isn't, so please forget the idea. > > +ENTRY(lpae_pgtables_remap_asm) > > + stmfd sp!, {r4-r8, lr} > > + > > + mrc p15, 0, r8, c1, c0, 0 @ read control reg > > + bic ip, r8, #CR_M @ disable caches and MMU > > + mcr p15, 0, ip, c1, c0, 0 > > + dsb > > + isb > > Shouldn't the DSB be between the STMFD and the MCR (given the SP doesn't > point to an idmap/physical address)? > > I don't see why we need a DSB after the write to the SCTLR. > > [...] > > > + dsb > > + isb > > + > > + mcr p15, 0, r8, c1, c0, 0 @ re-enable MMU > > + dsb > > + isb > > Similarly, isn't the last DSB redundant? I've really no idea. All I know is that the above works. I'm rather sick of trying to read the ARM ARM and not understanding exactly what ISB and DSB actually do.
On Wed, Apr 08, 2015 at 04:34:28PM +0200, Thomas Petazzoni wrote: > But since I'm not going to write an e-mail just for such a small nit, > here is an hopefully more interesting question: do you think it would > be possible to use this logic for my Armada 370/XP/375/38x case, where > I need some specific page table configuration to be able to use HW I/O > coherency in non-SMP configurations? This implementation only handles LPAE, and it only knows to add an offset to each LPAE page table descriptor. So, adapting this code to that application isn't going to be easy, because you're talking about setting and/or clearing some bits. So, I think it would take a re-implementation of it to make it work. Once the review comments have been addressed, at least this gives an idea how to address the coherency issue there.
On 4/8/2015 10:55 AM, Russell King - ARM Linux wrote: > On Wed, Apr 08, 2015 at 06:36:56PM +0100, Mark Rutland wrote: >> Hi Russell, >> >> On Wed, Apr 08, 2015 at 10:45:30AM +0100, Russell King wrote: >>> - /* >>> - * Ensure that the above updates are flushed out of the cache. >>> - * This is not strictly correct; on a system where the caches >>> - * are coherent with each other, but the MMU page table walks >>> - * may not be coherent, flush_cache_all() may be a no-op, and >>> - * this will fail. >>> + * We changing not only the virtual to physical mapping, but >>> + * also the physical addresses used to access memory. We need >>> + * to flush all levels of cache in the system with caching >>> + * disabled to ensure that all data is written back. We do this >>> + * with caching and write buffering disabled to ensure that >>> + * nothing is speculatively prefetched. >>> */ >>> + cr = get_cr(); >>> + set_cr(cr & ~(CR_I | CR_C | CR_W)); >> >> SCTLR[3] (CR_W) is RAO/SBOP in VMSAv7. I don't think we should clear it >> here for ARMv7. > > I was in two minds about that - I guess as we're expecting to only run > this on ARMv7 CPUs, we can omit clearing the CR_W, but I'd need to add > a comment saying that it's ARMv7 only. > Yep. We can do away without the CR_W change. >> To the best of my knowledge, the page table walkers aren't affected by >> SCTLR.C, and use the attributes in TTBR{0,1} or TTBCR when translation >> is active (i.e. when SCTLR.M is set). So at this point they can make >> cacheable accesses to the page tables (and allocate into the caches) in >> the background... > > We had better clear those bits too then. > >> I think that the cache flush needs to be performed after both >> SCTLR.{C,M} are cleared in lpae_pgtables_remap_asm, just before the page >> table updates. So long as the relevant pieces of kernel text are >> initially clean to the PoC, we shouldn't need to flush anything >> beforehand. > > To that I say... no bloody way in hell, even once hell has frozen > over. It took almost a _day_ to get this much working, much of it > was attempting to use cache flushing functions after the MMU had > been turned off. > > If it was possible to make it work, I'd have done so. It isn't, so > please forget the idea. > I fully agree. I gone through the same pane while incorporating Will's comment on similar lines last time. >>> +ENTRY(lpae_pgtables_remap_asm) >>> + stmfd sp!, {r4-r8, lr} >>> + >>> + mrc p15, 0, r8, c1, c0, 0 @ read control reg >>> + bic ip, r8, #CR_M @ disable caches and MMU >>> + mcr p15, 0, ip, c1, c0, 0 >>> + dsb >>> + isb >> >> Shouldn't the DSB be between the STMFD and the MCR (given the SP doesn't >> point to an idmap/physical address)? >> >> I don't see why we need a DSB after the write to the SCTLR. >> dsb can be moved up after stmfd but leaving as above should be fine as well. >> [...] >> >>> + dsb >>> + isb >>> + >>> + mcr p15, 0, r8, c1, c0, 0 @ re-enable MMU >>> + dsb >>> + isb >> >> Similarly, isn't the last DSB redundant? > This dsb probably can be dropped but I leave that to Russell to decide. That one extra instruction doesn't hurt much. Regards, Santosh > I've really no idea. All I know is that the above works. I'm rather > sick of trying to read the ARM ARM and not understanding exactly what > ISB and DSB actually do. >
Hi, > >> To the best of my knowledge, the page table walkers aren't affected by > >> SCTLR.C, and use the attributes in TTBR{0,1} or TTBCR when translation > >> is active (i.e. when SCTLR.M is set). So at this point they can make > >> cacheable accesses to the page tables (and allocate into the caches) in > >> the background... > > > > We had better clear those bits too then. > > > >> I think that the cache flush needs to be performed after both > >> SCTLR.{C,M} are cleared in lpae_pgtables_remap_asm, just before the page > >> table updates. So long as the relevant pieces of kernel text are > >> initially clean to the PoC, we shouldn't need to flush anything > >> beforehand. > > > > To that I say... no bloody way in hell, even once hell has frozen > > over. It took almost a _day_ to get this much working, much of it > > was attempting to use cache flushing functions after the MMU had > > been turned off. > > > > If it was possible to make it work, I'd have done so. It isn't, so > > please forget the idea. > > > I fully agree. I gone through the same pane while incorporating Will's > comment on similar lines last time. I'm surprised that it's so painful to get that working. I don't have a system I could test this on, so unfortunately I can't offer much help there. > >>> +ENTRY(lpae_pgtables_remap_asm) > >>> + stmfd sp!, {r4-r8, lr} > >>> + > >>> + mrc p15, 0, r8, c1, c0, 0 @ read control reg > >>> + bic ip, r8, #CR_M @ disable caches and MMU > >>> + mcr p15, 0, ip, c1, c0, 0 > >>> + dsb > >>> + isb > >> > >> Shouldn't the DSB be between the STMFD and the MCR (given the SP doesn't > >> point to an idmap/physical address)? > >> > >> I don't see why we need a DSB after the write to the SCTLR. > >> > dsb can be moved up after stmfd but leaving as above should be fine > as well. I don't think that it's safe to leave it where it is. Currently the STMFD could be reordered w.r.t. the cp15 accesses, and hence the write may occur with translation disabled (and would go to the wrong physical address). We need to ensure that the STMFD is executed before the MCR potentially changes the execution context. The DSB will ensure that in addition to ensuring completion of the write (i.e. it isn't left sat in a write buffer). > >> [...] > >> > >>> + dsb > >>> + isb > >>> + > >>> + mcr p15, 0, r8, c1, c0, 0 @ re-enable MMU > >>> + dsb > >>> + isb > >> > >> Similarly, isn't the last DSB redundant? > > > This dsb probably can be dropped but I leave that to Russell > to decide. That one extra instruction doesn't hurt much. I don't see that it adds anything to the ISB given the DSB; ISB prior to the write to the SCTLR -- there's nothing it would ensure completion of that the first DSB won't already have. Mark.
On 4/15/2015 5:07 AM, Mark Rutland wrote: > Hi, > [..] >>>>> +ENTRY(lpae_pgtables_remap_asm) >>>>> + stmfd sp!, {r4-r8, lr} >>>>> + >>>>> + mrc p15, 0, r8, c1, c0, 0 @ read control reg >>>>> + bic ip, r8, #CR_M @ disable caches and MMU >>>>> + mcr p15, 0, ip, c1, c0, 0 >>>>> + dsb >>>>> + isb >>>> >>>> Shouldn't the DSB be between the STMFD and the MCR (given the SP doesn't >>>> point to an idmap/physical address)? >>>> >>>> I don't see why we need a DSB after the write to the SCTLR. >>>> >> dsb can be moved up after stmfd but leaving as above should be fine >> as well. > > I don't think that it's safe to leave it where it is. Currently the > STMFD could be reordered w.r.t. the cp15 accesses, and hence the write > may occur with translation disabled (and would go to the wrong physical > address). > > We need to ensure that the STMFD is executed before the MCR potentially > changes the execution context. The DSB will ensure that in addition to > ensuring completion of the write (i.e. it isn't left sat in a write > buffer). > I see your point. Thanks for expanding it. Regards, Santosh
Hi, > >>>>> +ENTRY(lpae_pgtables_remap_asm) > >>>>> + stmfd sp!, {r4-r8, lr} > >>>>> + > >>>>> + mrc p15, 0, r8, c1, c0, 0 @ read control reg > >>>>> + bic ip, r8, #CR_M @ disable caches and MMU > >>>>> + mcr p15, 0, ip, c1, c0, 0 > >>>>> + dsb > >>>>> + isb > >>>> > >>>> Shouldn't the DSB be between the STMFD and the MCR (given the SP doesn't > >>>> point to an idmap/physical address)? > >>>> > >>>> I don't see why we need a DSB after the write to the SCTLR. > >>>> > >> dsb can be moved up after stmfd but leaving as above should be fine > >> as well. > > > > I don't think that it's safe to leave it where it is. Currently the > > STMFD could be reordered w.r.t. the cp15 accesses, and hence the write > > may occur with translation disabled (and would go to the wrong physical > > address). > > > > We need to ensure that the STMFD is executed before the MCR potentially > > changes the execution context. The DSB will ensure that in addition to > > ensuring completion of the write (i.e. it isn't left sat in a write > > buffer). > > > I see your point. Thanks for expanding it. It turns out that I was incorrect in my assertion, and the reordering I suggested above can't happen. The ARMv7 ARM states: Any direct write to a system control register is guaranteed not to affect any instruction that appears, in program order, before the instruction that performed the direct write Which means that the STMFD cannot be affected by the later cp15 write to the SCTLR, and so the DSB does not need to be moved before the MCR. I apologise for adding to the confusion there. Mark.
On Thu, Apr 23, 2015 at 12:24:58PM +0100, Mark Rutland wrote: > Hi, > > > >>>>> +ENTRY(lpae_pgtables_remap_asm) > > >>>>> + stmfd sp!, {r4-r8, lr} > > >>>>> + > > >>>>> + mrc p15, 0, r8, c1, c0, 0 @ read control reg > > >>>>> + bic ip, r8, #CR_M @ disable caches and MMU > > >>>>> + mcr p15, 0, ip, c1, c0, 0 > > >>>>> + dsb > > >>>>> + isb > > >>>> > > >>>> Shouldn't the DSB be between the STMFD and the MCR (given the SP doesn't > > >>>> point to an idmap/physical address)? > > >>>> > > >>>> I don't see why we need a DSB after the write to the SCTLR. > > >>>> > > >> dsb can be moved up after stmfd but leaving as above should be fine > > >> as well. > > > > > > I don't think that it's safe to leave it where it is. Currently the > > > STMFD could be reordered w.r.t. the cp15 accesses, and hence the write > > > may occur with translation disabled (and would go to the wrong physical > > > address). > > > > > > We need to ensure that the STMFD is executed before the MCR potentially > > > changes the execution context. The DSB will ensure that in addition to > > > ensuring completion of the write (i.e. it isn't left sat in a write > > > buffer). > > > > > I see your point. Thanks for expanding it. > > It turns out that I was incorrect in my assertion, and the reordering I > suggested above can't happen. The ARMv7 ARM states: > > Any direct write to a system control register is guaranteed not > to affect any instruction that appears, in program > order, before the instruction that performed the direct write > > Which means that the STMFD cannot be affected by the later cp15 write to > the SCTLR, and so the DSB does not need to be moved before the MCR. > > I apologise for adding to the confusion there. So does this mean this patch gets an ack now?
Hi Russell, > > It turns out that I was incorrect in my assertion, and the reordering I > > suggested above can't happen. The ARMv7 ARM states: > > > > Any direct write to a system control register is guaranteed not > > to affect any instruction that appears, in program > > order, before the instruction that performed the direct write > > > > Which means that the STMFD cannot be affected by the later cp15 write to > > the SCTLR, and so the DSB does not need to be moved before the MCR. > > > > I apologise for adding to the confusion there. > > So does this mean this patch gets an ack now? I assumed there was going to be a respin for the CR_W change? There's also the dodginess w.r.t. the page table walkers that I can't see is solvable short of disabling the MMU prior to the flush, though I understand you've NAKed that approach. Thanks, Mark.
On Wed, May 06, 2015 at 11:37:37AM +0100, Mark Rutland wrote: > Hi Russell, > > > > It turns out that I was incorrect in my assertion, and the reordering I > > > suggested above can't happen. The ARMv7 ARM states: > > > > > > Any direct write to a system control register is guaranteed not > > > to affect any instruction that appears, in program > > > order, before the instruction that performed the direct write > > > > > > Which means that the STMFD cannot be affected by the later cp15 write to > > > the SCTLR, and so the DSB does not need to be moved before the MCR. > > > > > > I apologise for adding to the confusion there. > > > > So does this mean this patch gets an ack now? > > I assumed there was going to be a respin for the CR_W change? > > There's also the dodginess w.r.t. the page table walkers that I can't > see is solvable short of disabling the MMU prior to the flush, though I > understand you've NAKed that approach. Why? Are you saying that after: + mrc p15, 0, r8, c1, c0, 0 @ read control reg + bic ip, r8, #CR_M @ disable caches and MMU + mcr p15, 0, ip, c1, c0, 0 + dsb + isb the page table walkers are still actively walking the page table? That to me sounds like a hardware bug. The point of this is to shut down the MMU, _then_ update the page tables, and _then_ to re-enable the MMU to explicitly avoid problems with the page table walkers. From what you're saying, this is just not possible, and we should just throw Keystone SoCs in the bin...
On Wed, May 06, 2015 at 12:33:13PM +0100, Russell King - ARM Linux wrote: > On Wed, May 06, 2015 at 11:37:37AM +0100, Mark Rutland wrote: > > Hi Russell, > > > > > > It turns out that I was incorrect in my assertion, and the reordering I > > > > suggested above can't happen. The ARMv7 ARM states: > > > > > > > > Any direct write to a system control register is guaranteed not > > > > to affect any instruction that appears, in program > > > > order, before the instruction that performed the direct write > > > > > > > > Which means that the STMFD cannot be affected by the later cp15 write to > > > > the SCTLR, and so the DSB does not need to be moved before the MCR. > > > > > > > > I apologise for adding to the confusion there. > > > > > > So does this mean this patch gets an ack now? > > > > I assumed there was going to be a respin for the CR_W change? > > > > There's also the dodginess w.r.t. the page table walkers that I can't > > see is solvable short of disabling the MMU prior to the flush, though I > > understand you've NAKed that approach. > > Why? I was on about the pre-assembly portion: cr = get_cr(); set_cr(cr & ~(CR_I | CR_C | CR_W)); flush_cache_all(); With the MMU on at this point the page table walkers can race with the set/way maintenance. It also relies on the compiler not making stack accesses between the SCTLR write and the completion of flush_cache_all, which is likely but not guranteed. So this won't necessarily flush out the data it seems to be intended to. > Are you saying that after: > > + mrc p15, 0, r8, c1, c0, 0 @ read control reg > + bic ip, r8, #CR_M @ disable caches and MMU > + mcr p15, 0, ip, c1, c0, 0 > + dsb > + isb > > the page table walkers are still actively walking the page table? > > That to me sounds like a hardware bug. The point of this is to shut > down the MMU, _then_ update the page tables, and _then_ to re-enable > the MMU to explicitly avoid problems with the page table walkers. I agree that after this point it would be a bug for the page table walkers to make cacheable accesses. Thanks, Mark.
On Wed, May 06, 2015 at 04:33:38PM +0100, Mark Rutland wrote: > On Wed, May 06, 2015 at 12:33:13PM +0100, Russell King - ARM Linux wrote: > > On Wed, May 06, 2015 at 11:37:37AM +0100, Mark Rutland wrote: > > > Hi Russell, > > > > > > > > It turns out that I was incorrect in my assertion, and the reordering I > > > > > suggested above can't happen. The ARMv7 ARM states: > > > > > > > > > > Any direct write to a system control register is guaranteed not > > > > > to affect any instruction that appears, in program > > > > > order, before the instruction that performed the direct write > > > > > > > > > > Which means that the STMFD cannot be affected by the later cp15 write to > > > > > the SCTLR, and so the DSB does not need to be moved before the MCR. > > > > > > > > > > I apologise for adding to the confusion there. > > > > > > > > So does this mean this patch gets an ack now? > > > > > > I assumed there was going to be a respin for the CR_W change? > > > > > > There's also the dodginess w.r.t. the page table walkers that I can't > > > see is solvable short of disabling the MMU prior to the flush, though I > > > understand you've NAKed that approach. > > > > Why? > > I was on about the pre-assembly portion: > > cr = get_cr(); > set_cr(cr & ~(CR_I | CR_C | CR_W)); > flush_cache_all(); > > With the MMU on at this point the page table walkers can race with the > set/way maintenance. It also relies on the compiler not making stack > accesses between the SCTLR write and the completion of flush_cache_all, > which is likely but not guranteed. Yes, I suppose you're right, but there's really no other way to do this other than coding up CPU specific code. What we'd need to do is to code up a specific assembly routine which disables the I and C control register flags, flushes the cache, jumps to the physical alias and then disables the MMU all without touching memory. That's far too much effort. Like I say, maybe we should just bin Keystone for this crap... I'm not interested in trying to "fix" this code any further - as I said earlier, it took quite a while to get this code working on Keystone, which is really all we care about. Anyone else who copies the abortion that TI made in Keystone should be shot. :) While the community has the luxury of saying "we don't like it, it's TI's problem" - which is what was done when the problems were first pointed out by Will, the net result is that this problem became my problem to try and sort out. So, if you have a better idea how to fix this, I'm all ears. What I'm certain of though is that this is an improvement over what's in the kernel today, and so should go in irrespective of whether it's totally bullet-proof or not.
Hi Russell, > > cr = get_cr(); > > set_cr(cr & ~(CR_I | CR_C | CR_W)); > > flush_cache_all(); > > > > With the MMU on at this point the page table walkers can race with the > > set/way maintenance. It also relies on the compiler not making stack > > accesses between the SCTLR write and the completion of flush_cache_all, > > which is likely but not guranteed. > > Yes, I suppose you're right, but there's really no other way to do > this other than coding up CPU specific code. > > What we'd need to do is to code up a specific assembly routine which > disables the I and C control register flags, flushes the cache, jumps > to the physical alias and then disables the MMU all without touching > memory. That's far too much effort. Like I say, maybe we should just > bin Keystone for this crap... > > I'm not interested in trying to "fix" this code any further - as I > said earlier, it took quite a while to get this code working on > Keystone, which is really all we care about. Anyone else who copies > the abortion that TI made in Keystone should be shot. :) > > While the community has the luxury of saying "we don't like it, it's > TI's problem" - which is what was done when the problems were first > pointed out by Will, the net result is that this problem became my > problem to try and sort out. > > So, if you have a better idea how to fix this, I'm all ears. > > What I'm certain of though is that this is an improvement over what's > in the kernel today, and so should go in irrespective of whether it's > totally bullet-proof or not. I don't disagree with that. :) w.r.t. "better" ideas, my initial thought was that we could move the SCTLR.{C,M} clearing into lpae_pgtables_remap_asm, along with a call to v7_flush_dcache_all (as it should be in the physical mapping). So long as the kernel text was initially clean to the PoC I'd expect that to work, but I understood from your initial reply you'd tried that and something went wrong. Perhaps we can look into that later. Thanks, Mark.
On Wed, May 06, 2015 at 05:14:06PM +0100, Mark Rutland wrote: > > > cr = get_cr(); > > > set_cr(cr & ~(CR_I | CR_C | CR_W)); > > > flush_cache_all(); > > > > > > With the MMU on at this point the page table walkers can race with the > > > set/way maintenance. It also relies on the compiler not making stack > > > accesses between the SCTLR write and the completion of flush_cache_all, > > > which is likely but not guranteed. > > > > Yes, I suppose you're right, but there's really no other way to do > > this other than coding up CPU specific code. > > > > What we'd need to do is to code up a specific assembly routine which > > disables the I and C control register flags, flushes the cache, jumps > > to the physical alias and then disables the MMU all without touching > > memory. That's far too much effort. Like I say, maybe we should just > > bin Keystone for this crap... > > > > I'm not interested in trying to "fix" this code any further - as I > > said earlier, it took quite a while to get this code working on > > Keystone, which is really all we care about. Anyone else who copies > > the abortion that TI made in Keystone should be shot. :) > > > > While the community has the luxury of saying "we don't like it, it's > > TI's problem" - which is what was done when the problems were first > > pointed out by Will, the net result is that this problem became my > > problem to try and sort out. > > > > So, if you have a better idea how to fix this, I'm all ears. > > > > What I'm certain of though is that this is an improvement over what's > > in the kernel today, and so should go in irrespective of whether it's > > totally bullet-proof or not. > > I don't disagree with that. :) > > w.r.t. "better" ideas, my initial thought was that we could move the > SCTLR.{C,M} clearing into lpae_pgtables_remap_asm, along with a call to > v7_flush_dcache_all (as it should be in the physical mapping). So long > as the kernel text was initially clean to the PoC I'd expect that to > work, but I understood from your initial reply you'd tried that and > something went wrong. > > Perhaps we can look into that later. I think a comment summarising the conclusion of this thread above the cache flush and rmk's choice of adjective to describe the keystone SoC would be sufficient. Will
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig index b7644310236b..cc873e74643a 100644 --- a/arch/arm/mm/Kconfig +++ b/arch/arm/mm/Kconfig @@ -624,6 +624,10 @@ config ARM_LPAE If unsure, say N. +config ARM_PV_FIXUP + def_bool y + depends on ARM_LPAE && ARM_PATCH_PHYS_VIRT && ARCH_KEYSTONE + config ARCH_PHYS_ADDR_T_64BIT def_bool ARM_LPAE diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile index 19df9bd9a79f..290ec5056b3b 100644 --- a/arch/arm/mm/Makefile +++ b/arch/arm/mm/Makefile @@ -18,6 +18,7 @@ obj-$(CONFIG_MODULES) += proc-syms.o obj-$(CONFIG_ALIGNMENT_TRAP) += alignment.o obj-$(CONFIG_HIGHMEM) += highmem.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o +obj-$(CONFIG_ARM_PV_FIXUP) += pv-fixup-asm.o obj-$(CONFIG_CPU_ABRT_NOMMU) += abort-nommu.o obj-$(CONFIG_CPU_ABRT_EV4) += abort-ev4.o diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c index 4eab87eab5aa..addb7987c714 100644 --- a/arch/arm/mm/mmu.c +++ b/arch/arm/mm/mmu.c @@ -1387,7 +1387,11 @@ static void __init map_lowmem(void) } } -#if defined(CONFIG_ARM_LPAE) && defined(CONFIG_ARM_PATCH_PHYS_VIRT) +#ifdef CONFIG_ARM_PV_FIXUP +extern unsigned long __atags_pointer; +typedef void pgtables_remap(long long offset, unsigned long pgd, void *bdata); +pgtables_remap lpae_pgtables_remap_asm; + /* * early_paging_init() recreates boot time page table setup, allowing machines * to switch over to a high (>4G) address space on LPAE systems @@ -1395,35 +1399,30 @@ static void __init map_lowmem(void) void __init early_paging_init(const struct machine_desc *mdesc, struct proc_info_list *procinfo) { - pmdval_t pmdprot = procinfo->__cpu_mm_mmu_flags; - unsigned long map_start, map_end; + pgtables_remap *lpae_pgtables_remap; + unsigned long pa_pgd; + unsigned int cr; long long offset; - pgd_t *pgd0, *pgdk; - pud_t *pud0, *pudk, *pud_start; - pmd_t *pmd0, *pmdk; - phys_addr_t phys; - int i; + void *boot_data; if (!mdesc->pv_fixup) return; - /* remap kernel code and data */ - map_start = init_mm.start_code & PMD_MASK; - map_end = ALIGN(init_mm.brk, PMD_SIZE); - - /* get a handle on things... */ - pgd0 = pgd_offset_k(0); - pud_start = pud0 = pud_offset(pgd0, 0); - pmd0 = pmd_offset(pud0, 0); - - pgdk = pgd_offset_k(map_start); - pudk = pud_offset(pgdk, map_start); - pmdk = pmd_offset(pudk, map_start); - offset = mdesc->pv_fixup(); if (offset == 0) return; + /* + * Get the address of the remap function in the 1:1 identity + * mapping setup by the early page table assembly code. We + * must get this prior to the pv update. The following barrier + * ensures that this is complete before we fixup any P:V offsets. + */ + lpae_pgtables_remap = (pgtables_remap *)(unsigned long)__pa(lpae_pgtables_remap_asm); + pa_pgd = __pa(swapper_pg_dir); + boot_data = __va(__atags_pointer); + barrier(); + pr_info("Switching physical address space to 0x%08llx\n", (u64)PHYS_OFFSET + offset); @@ -1436,75 +1435,27 @@ void __init early_paging_init(const struct machine_desc *mdesc, (&__pv_table_end - &__pv_table_begin) << 2); /* - * Cache cleaning operations for self-modifying code - * We should clean the entries by MVA but running a - * for loop over every pv_table entry pointer would - * just complicate the code. - */ - flush_cache_louis(); - dsb(ishst); - isb(); - - /* - * FIXME: This code is not architecturally compliant: we modify - * the mappings in-place, indeed while they are in use by this - * very same code. This may lead to unpredictable behaviour of - * the CPU. - * - * Even modifying the mappings in a separate page table does - * not resolve this. - * - * The architecture strongly recommends that when a mapping is - * changed, that it is changed by first going via an invalid - * mapping and back to the new mapping. This is to ensure that - * no TLB conflicts (caused by the TLB having more than one TLB - * entry match a translation) can occur. However, doing that - * here will result in unmapping the code we are running. - */ - pr_warn("WARNING: unsafe modification of in-place page tables - tainting kernel\n"); - add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK); - - /* - * Remap level 1 table. This changes the physical addresses - * used to refer to the level 2 page tables to the high - * physical address alias, leaving everything else the same. - */ - for (i = 0; i < PTRS_PER_PGD; pud0++, i++) { - set_pud(pud0, - __pud(__pa(pmd0) | PMD_TYPE_TABLE | L_PGD_SWAPPER)); - pmd0 += PTRS_PER_PMD; - } - - /* - * Remap the level 2 table, pointing the mappings at the high - * physical address alias of these pages. - */ - phys = __pa(map_start); - do { - *pmdk++ = __pmd(phys | pmdprot); - phys += PMD_SIZE; - } while (phys < map_end); - - /* - * Ensure that the above updates are flushed out of the cache. - * This is not strictly correct; on a system where the caches - * are coherent with each other, but the MMU page table walks - * may not be coherent, flush_cache_all() may be a no-op, and - * this will fail. + * We changing not only the virtual to physical mapping, but + * also the physical addresses used to access memory. We need + * to flush all levels of cache in the system with caching + * disabled to ensure that all data is written back. We do this + * with caching and write buffering disabled to ensure that + * nothing is speculatively prefetched. */ + cr = get_cr(); + set_cr(cr & ~(CR_I | CR_C | CR_W)); flush_cache_all(); /* - * Re-write the TTBR values to point them at the high physical - * alias of the page tables. We expect __va() will work on - * cpu_get_pgd(), which returns the value of TTBR0. + * Fixup the page tables - this must be in the idmap region as + * we need to disable the MMU to do this safely, and hence it + * needs to be assembly. It's fairly simple, as we're using the + * temporary tables setup by the initial assembly code. */ - cpu_switch_mm(pgd0, &init_mm); - cpu_set_ttbr(1, __pa(pgd0) + TTBR1_OFFSET); + lpae_pgtables_remap(offset, pa_pgd, boot_data); - /* Finally flush any stale TLB values. */ - local_flush_bp_all(); - local_flush_tlb_all(); + /* Re-enable the caches */ + set_cr(cr); } #else diff --git a/arch/arm/mm/pv-fixup-asm.S b/arch/arm/mm/pv-fixup-asm.S new file mode 100644 index 000000000000..1867f3e43016 --- /dev/null +++ b/arch/arm/mm/pv-fixup-asm.S @@ -0,0 +1,88 @@ +/* + * Copyright (C) 2015 Russell King + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This assembly is required to safely remap the physical address space + * for Keystone 2 + */ +#include <linux/linkage.h> +#include <asm/asm-offsets.h> +#include <asm/cp15.h> +#include <asm/memory.h> +#include <asm/pgtable.h> + + .section ".idmap.text", "ax" + +#define L1_ORDER 3 +#define L2_ORDER 3 + +ENTRY(lpae_pgtables_remap_asm) + stmfd sp!, {r4-r8, lr} + + mrc p15, 0, r8, c1, c0, 0 @ read control reg + bic ip, r8, #CR_M @ disable caches and MMU + mcr p15, 0, ip, c1, c0, 0 + dsb + isb + + /* Update level 2 entries covering the kernel */ + ldr r6, =(_end - 1) + add r7, r2, #0x1000 + add r6, r7, r6, lsr #SECTION_SHIFT - L2_ORDER + add r7, r7, #PAGE_OFFSET >> (SECTION_SHIFT - L2_ORDER) +1: ldrd r4, [r7] + adds r4, r4, r0 + adc r5, r5, r1 + strd r4, [r7], #1 << L2_ORDER + cmp r7, r6 + bls 1b + + /* Update level 2 entries for the boot data */ + add r7, r2, #0x1000 + add r7, r7, r3, lsr #SECTION_SHIFT - L2_ORDER + bic r7, r7, #(1 << L2_ORDER) - 1 + ldrd r4, [r7] + adds r4, r4, r0 + adc r5, r5, r1 + strd r4, [r7], #1 << L2_ORDER + ldrd r4, [r7] + adds r4, r4, r0 + adc r5, r5, r1 + strd r4, [r7] + + /* Update level 1 entries */ + mov r6, #4 + mov r7, r2 +2: ldrd r4, [r7] + adds r4, r4, r0 + adc r5, r5, r1 + strd r4, [r7], #1 << L1_ORDER + subs r6, r6, #1 + bne 2b + + mrrc p15, 0, r4, r5, c2 @ read TTBR0 + adds r4, r4, r0 @ update physical address + adc r5, r5, r1 + mcrr p15, 0, r4, r5, c2 @ write back TTBR0 + mrrc p15, 1, r4, r5, c2 @ read TTBR1 + adds r4, r4, r0 @ update physical address + adc r5, r5, r1 + mcrr p15, 1, r4, r5, c2 @ write back TTBR1 + + dsb + + mov ip, #0 + mcr p15, 0, ip, c7, c5, 0 @ I+BTB cache invalidate + mcr p15, 0, ip, c8, c7, 0 @ local_flush_tlb_all() + dsb + isb + + mcr p15, 0, r8, c1, c0, 0 @ re-enable MMU + dsb + isb + + ldmfd sp!, {r4-r8, pc} +ENDPROC(lpae_pgtables_remap_asm)
Re-implement the physical address space switching to be architecturally complaint. This involves flushing the caches, disabling the MMU, and only then updating the page tables. Once that is complete, the system can be brought back up again. Since we disable the MMU, we need to do the update in assembly code. Luckily, the entries which need updating are fairly trivial, and are all setup by the early assembly code. We can merely adjust each entry by the delta required. Not only does htis fix the code to be architecturally compliant, but it fixes a couple of bugs too: 1. The original code would only ever update the first L2 entry covering a fraction of the kernel; the remainder were left untouched. 2. The L2 entries covering the DTB blob were likewise untouched. This solution fixes up all entries. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> --- arch/arm/mm/Kconfig | 4 ++ arch/arm/mm/Makefile | 1 + arch/arm/mm/mmu.c | 119 +++++++++++++-------------------------------- arch/arm/mm/pv-fixup-asm.S | 88 +++++++++++++++++++++++++++++++++ 4 files changed, 128 insertions(+), 84 deletions(-) create mode 100644 arch/arm/mm/pv-fixup-asm.S