Message ID | 20220809091558.14379-6-alexandru.elisei@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm/arm64: Rework cache maintenance at boot | expand |
On Tue, Aug 09, 2022 at 10:15:44AM +0100, Alexandru Elisei wrote: > With powerpc moving the page allocator, there are no architectures left > which use the physical allocator after the boot setup: arm, arm64, > s390x and powerpc drain the physical allocator to initialize the page > allocator; and x86 calls setup_vm() to drain the allocator for each of > the tests that allocate memory. Please put the motivation for this change in the commit message. I looked ahead at the next patch to find it, but I'm not sure I agree with it. We should be able to keep the locking even when used early, since we probably need our locking to be something we can use early elsewhere anyway. Thanks, drew
Hi, On Tue, Sep 20, 2022 at 10:45:53AM +0200, Andrew Jones wrote: > On Tue, Aug 09, 2022 at 10:15:44AM +0100, Alexandru Elisei wrote: > > With powerpc moving the page allocator, there are no architectures left > > which use the physical allocator after the boot setup: arm, arm64, > > s390x and powerpc drain the physical allocator to initialize the page > > allocator; and x86 calls setup_vm() to drain the allocator for each of > > the tests that allocate memory. > > Please put the motivation for this change in the commit message. I looked > ahead at the next patch to find it, but I'm not sure I agree with it. We > should be able to keep the locking even when used early, since we probably > need our locking to be something we can use early elsewhere anyway. You are correct, the commit message doesn't explain why locking is removed, which makes the commit confusing. I will try to do a better job for the next iteration (if we decide to keep this patch). I removed locking because the physical allocator by the end of the series will end up being used only by arm64 to create the idmap, which is done on the boot CPU and with the MMU off. After that, the translation table allocator functions will use the page allocator, which can be used concurrently. Looking at the spinlock implementation, spin_lock() doesn't protect from the concurrent accesses when the MMU is disabled (lock->v is unconditionally set to 1). Which means that spin_lock() does not work (in the sense that it doesn't protect against concurrent accesses) on the boot path, which doesn't need a spinlock anyway, because no secondaries are online secondaries. It also means that spinlocks don't work when AUXINFO_MMU_OFF is set. So for the purpose of simplicity I preferred to drop it entirely. Thanks, Alex > > Thanks, > drew
On Tue, Sep 20, 2022 at 02:20:48PM +0100, Alexandru Elisei wrote: > Hi, > > On Tue, Sep 20, 2022 at 10:45:53AM +0200, Andrew Jones wrote: > > On Tue, Aug 09, 2022 at 10:15:44AM +0100, Alexandru Elisei wrote: > > > With powerpc moving the page allocator, there are no architectures left > > > which use the physical allocator after the boot setup: arm, arm64, > > > s390x and powerpc drain the physical allocator to initialize the page > > > allocator; and x86 calls setup_vm() to drain the allocator for each of > > > the tests that allocate memory. > > > > Please put the motivation for this change in the commit message. I looked > > ahead at the next patch to find it, but I'm not sure I agree with it. We > > should be able to keep the locking even when used early, since we probably > > need our locking to be something we can use early elsewhere anyway. > > You are correct, the commit message doesn't explain why locking is removed, > which makes the commit confusing. I will try to do a better job for the > next iteration (if we decide to keep this patch). > > I removed locking because the physical allocator by the end of the series > will end up being used only by arm64 to create the idmap, which is done on If only arm, and no unit tests, needs the phys allocator, then it can be integrated with whatever arm is using it for and removed from the general lib. > the boot CPU and with the MMU off. After that, the translation table > allocator functions will use the page allocator, which can be used > concurrently. > > Looking at the spinlock implementation, spin_lock() doesn't protect from > the concurrent accesses when the MMU is disabled (lock->v is > unconditionally set to 1). Which means that spin_lock() does not work (in > the sense that it doesn't protect against concurrent accesses) on the boot > path, which doesn't need a spinlock anyway, because no secondaries are > online secondaries. It also means that spinlocks don't work when > AUXINFO_MMU_OFF is set. So for the purpose of simplicity I preferred to > drop it entirely. If other architectures or unit tests have / could have uses for the phys allocator then we should either document that it doesn't have locks or keep the locks, and arm will just know that they don't work, but also that they don't need to for its purposes. Finally, if we drop the locks and arm doesn't have any other places where we use locks without the MMU enabled, then we can change the lock implementation to not have the no-mmu fallback - maybe by switching to the generic implementation as the other architectures have done. Thanks, drew
Hi, On Tue, Sep 20, 2022 at 04:59:52PM +0200, Andrew Jones wrote: > On Tue, Sep 20, 2022 at 02:20:48PM +0100, Alexandru Elisei wrote: > > Hi, > > > > On Tue, Sep 20, 2022 at 10:45:53AM +0200, Andrew Jones wrote: > > > On Tue, Aug 09, 2022 at 10:15:44AM +0100, Alexandru Elisei wrote: > > > > With powerpc moving the page allocator, there are no architectures left > > > > which use the physical allocator after the boot setup: arm, arm64, > > > > s390x and powerpc drain the physical allocator to initialize the page > > > > allocator; and x86 calls setup_vm() to drain the allocator for each of > > > > the tests that allocate memory. > > > > > > Please put the motivation for this change in the commit message. I looked > > > ahead at the next patch to find it, but I'm not sure I agree with it. We > > > should be able to keep the locking even when used early, since we probably > > > need our locking to be something we can use early elsewhere anyway. > > > > You are correct, the commit message doesn't explain why locking is removed, > > which makes the commit confusing. I will try to do a better job for the > > next iteration (if we decide to keep this patch). > > > > I removed locking because the physical allocator by the end of the series > > will end up being used only by arm64 to create the idmap, which is done on > > If only arm, and no unit tests, needs the phys allocator, then it can be > integrated with whatever arm is using it for and removed from the general > lib. I kept the allocator in lib because I thought that RISC-V might have an use for it. Since it's a RISC architecture, I was thinking that it also might require software cache management around enabling/disabling the MMU. But in the end it's up to you, it would be easy to move the physical allocator to lib/arm if you think that is best. > > > the boot CPU and with the MMU off. After that, the translation table > > allocator functions will use the page allocator, which can be used > > concurrently. > > > > Looking at the spinlock implementation, spin_lock() doesn't protect from > > the concurrent accesses when the MMU is disabled (lock->v is > > unconditionally set to 1). Which means that spin_lock() does not work (in > > the sense that it doesn't protect against concurrent accesses) on the boot > > path, which doesn't need a spinlock anyway, because no secondaries are > > online secondaries. It also means that spinlocks don't work when > > AUXINFO_MMU_OFF is set. So for the purpose of simplicity I preferred to > > drop it entirely. > > If other architectures or unit tests have / could have uses for the > phys allocator then we should either document that it doesn't have > locks or keep the locks, and arm will just know that they don't work, > but also that they don't need to for its purposes. I will write a comment explaining the baked in assumptions for the allocator. > > Finally, if we drop the locks and arm doesn't have any other places where > we use locks without the MMU enabled, then we can change the lock > implementation to not have the no-mmu fallback - maybe by switching to the > generic implementation as the other architectures have done. The architecture mandates that load-acquire/store-release instructions are supported only on Normal memory (to be more precise, Inner Shareable, Inner Write-Back, Outer Write-Back Normal memory with Read allocation hints and Write allocation hints and not transient and Outer Shareable, Inner Write-Back, Outer Write-Back Normal memory with Read allocation hints and Write allocation hints and not transient, ARM DDI 0487H.a, pages B2-211 and B2-212). If the AUXINFO_MMU_OFF flag is set, kvm-unit-tests doesn't enable the MMU at boot, which means that all tests can be run with the MMU disabled. In this case, all memory is Device-nGnRnE (instead of Normal). By using an implementation that doesn't take into account that spin_lock() might be called with the MMU disabled, kvm-unit-tests will end up using exclusive access instructions on memory which doesn't support it. This can have various effects, all rather unpleasant, like causing an external abort or treating the exclusive access instruction as a NOP (ARM DDI 0487H.a, page B2-212). Tested this on my rockpro64 board, kvm-unit-tests built from current master, with the mmu_disabled() path removed from spin_lock() (and AUXINFO_MMU_OFF flag set), all tests hang indefinitely, that's because phys_alloc_init() uses a spinlock. It is conceivable that we could rework the setup code to remove the usage of spinlocks, but it's still the matter of tests needing one for synchronization. It's also the matter of the uart needing one for puts. And report. And probably other places. Out of curiosity, without setting the AUXINFO_MMU_OFF flag, I tried using the generic version of the spinlock (I assume you mean the one from lib/asm-generic/spinlock.h, changed lib/arm64/asm/spinlock.h to include the above header), selftest-setup hangs without displaying anything before phys_alloc_init(), I have no idea why that is. In the current implementation, when AUXINFO_MMU_OFF is set, tests that actually use more than one thread might end up being incorrect some of the time because spin_lock() doesn't protect against concurrent accesses. That's pretty bad, but I think the alternative off all tests hanging indefinitely is worse. In my opinion, the current spinlock implementation is incorrect when the MMU is disabled, but using a generic implementation is worse. I guess another thing to put on the TODO list. Arm ARM recommends Lamport’s Bakery algorithm for mutual exclusion and we could try to implement that for the MMU disabled case, but I don't see much interest at the moment in running tests with the MMU disabled. Thanks, Alex > > Thanks, > drew
diff --git a/lib/alloc_phys.c b/lib/alloc_phys.c index efb783b34002..2e0b9c079d1d 100644 --- a/lib/alloc_phys.c +++ b/lib/alloc_phys.c @@ -21,7 +21,6 @@ struct phys_alloc_region { static struct phys_alloc_region regions[PHYS_ALLOC_NR_REGIONS]; static int nr_regions; -static struct spinlock lock; static phys_addr_t base, top; #define DEFAULT_MINIMUM_ALIGNMENT 32 @@ -37,7 +36,6 @@ void phys_alloc_show(void) { int i; - spin_lock(&lock); printf("phys_alloc minimum alignment: %#" PRIx64 "\n", (u64)align_min); for (i = 0; i < nr_regions; ++i) printf("%016" PRIx64 "-%016" PRIx64 " [%s]\n", @@ -46,24 +44,19 @@ void phys_alloc_show(void) "USED"); printf("%016" PRIx64 "-%016" PRIx64 " [%s]\n", (u64)base, (u64)(top - 1), "FREE"); - spin_unlock(&lock); } void phys_alloc_init(phys_addr_t base_addr, phys_addr_t size) { - spin_lock(&lock); base = base_addr; top = base + size; nr_regions = 0; - spin_unlock(&lock); } void phys_alloc_set_minimum_alignment(phys_addr_t align) { assert(align && !(align & (align - 1))); - spin_lock(&lock); align_min = align; - spin_unlock(&lock); } static void *memalign_early(size_t alignment, size_t sz) @@ -76,8 +69,6 @@ static void *memalign_early(size_t alignment, size_t sz) assert(align && !(align & (align - 1))); - spin_lock(&lock); - top_safe = top; if (sizeof(long) == 4) @@ -97,7 +88,6 @@ static void *memalign_early(size_t alignment, size_t sz) "top=%#" PRIx64 ", top_safe=%#" PRIx64 "\n", (u64)size_orig, (u64)align, (u64)size, (u64)(top_safe - base), (u64)top, (u64)top_safe); - spin_unlock(&lock); return NULL; } @@ -113,8 +103,6 @@ static void *memalign_early(size_t alignment, size_t sz) warned = true; } - spin_unlock(&lock); - return phys_to_virt(addr); } @@ -124,10 +112,8 @@ void phys_alloc_get_unused(phys_addr_t *p_base, phys_addr_t *p_top) *p_top = top; if (base == top) return; - spin_lock(&lock); regions[nr_regions].base = base; regions[nr_regions].size = top - base; ++nr_regions; base = top; - spin_unlock(&lock); }
With powerpc moving the page allocator, there are no architectures left which use the physical allocator after the boot setup: arm, arm64, s390x and powerpc drain the physical allocator to initialize the page allocator; and x86 calls setup_vm() to drain the allocator for each of the tests that allocate memory. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> --- lib/alloc_phys.c | 14 -------------- 1 file changed, 14 deletions(-)