Message ID | 20220120090918.2646626-1-atishp@rivosinc.com (mailing list archive) |
---|---|
Headers | show |
Series | Sparse HART id support | expand |
On Thu, 20 Jan 2022 01:09:12 PST (-0800), Atish Patra wrote: > Currently, sparse hartid is not supported for Linux RISC-V for the following > reasons. > 1. Both spinwait and ordered booting method uses __cpu_up_stack/task_pointer > which is an array size of NR_CPUs. > 2. During early booting, any hartid greater than NR_CPUs are not booted at all. > 3. riscv_cpuid_to_hartid_mask uses struct cpumask for generating hartid bitmap. > 4. SBI v0.2 implementation uses NR_CPUs as the maximum hartid number while > generating hartmask. > > In order to support sparse hartid, the hartid & NR_CPUS needs to be disassociated > which was logically incorrect anyways. NR_CPUs represent the maximum logical| > CPU id configured in the kernel while the hartid represent the physical hartid > stored in mhartid CSR defined by the privilege specification. Thus, hartid > can have much greater value than logical cpuid. > > Currently, we have two methods of booting. Ordered booting where the booting > hart brings up each non-booting hart one by one using SBI HSM extension. > The spinwait booting method relies on harts jumping to Linux kernel randomly > and boot hart is selected by a lottery. All other non-booting harts keep > spinning on __cpu_up_stack/task_pointer until boot hart initializes the data. > Both these methods rely on __cpu_up_stack/task_pointer to setup the stack/ > task pointer. The spinwait method is mostly used to support older firmwares > without SBI HSM extension and M-mode Linux. The ordered booting method is the > preferred booting method for booting general Linux because it can support > cpu hotplug and kexec. > > The first patch modified the ordered booting method to use an opaque parameter > already available in HSM start API to setup the stack/task pointer. The third > patch resolves the issue #1 by limiting the usage of > __cpu_up_stack/task_pointer to spinwait specific booting method. The fourth > and fifth patch moves the entire hart lottery selection and spinwait method > to a separate config that can be disabled if required. It solves the issue #2. > The 6th patch solves issue #3 and #4 by removing riscv_cpuid_to_hartid_mask > completely. All the SBI APIs directly pass a pointer to struct cpumask and > the SBI implementation takes care of generating the hart bitmap from the > cpumask. > > It is not trivial to support sparse hartid for spinwait booting method and > there are no usecases to support sparse hartid for spinwait method as well. > Any platform with sparse hartid will probably require more advanced features > such as cpu hotplug and kexec. Thus, the series supports the sparse hartid via > ordered booting method only. To maintain backward compatibility, spinwait > booting method is currently enabled in defconfig so that M-mode linux will > continue to work. Any platform that requires to sparse hartid must disable the > spinwait method. > > This series also fixes the out-of-bounds access error[1] reported by Geert. > The issue can be reproduced with SMP booting with NR_CPUS=4 on platforms with > discontiguous hart numbering (HiFive unleashed/unmatched & polarfire). > Spinwait method should also be disabled for such configuration where NR_CPUS > value is less than maximum hartid in the platform. > > [1] https://lore.kernel.org/lkml/CAMuHMdUPWOjJfJohxLJefHOrJBtXZ0xfHQt4=hXpUXnasiN+AQ@mail.gmail.com/#t > > The series is based on queue branch on kvm-riscv as it has kvm related changes > as well. I have tested it on HiFive Unmatched and Qemu. > > Changes from v2->v3: > 1. Rebased on linux-next > 2. Removed the redundant variable in PATCH 1. > 3. Added the reviewed-by/acked-by tags. > > Changes from v1->v2: > 1. Fixed few typos in Kconfig. > 2. Moved the boot data structure offsets to a asm-offset.c > 3. Removed the redundant config check in head.S > > Atish Patra (6): > RISC-V: Avoid using per cpu array for ordered booting > RISC-V: Do not print the SBI version during HSM extension boot print > RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method > RISC-V: Move the entire hart selection via lottery to SMP > RISC-V: Move spinwait booting method to its own config > RISC-V: Do not use cpumask data structure for hartid bitmap > > arch/riscv/Kconfig | 14 ++ > arch/riscv/include/asm/cpu_ops.h | 2 - > arch/riscv/include/asm/cpu_ops_sbi.h | 25 ++++ > arch/riscv/include/asm/sbi.h | 19 +-- > arch/riscv/include/asm/smp.h | 2 - > arch/riscv/kernel/Makefile | 3 +- > arch/riscv/kernel/asm-offsets.c | 3 + > arch/riscv/kernel/cpu_ops.c | 26 ++-- > arch/riscv/kernel/cpu_ops_sbi.c | 26 +++- > arch/riscv/kernel/cpu_ops_spinwait.c | 27 +++- > arch/riscv/kernel/head.S | 35 ++--- > arch/riscv/kernel/head.h | 6 +- > arch/riscv/kernel/sbi.c | 189 +++++++++++++++------------ > arch/riscv/kernel/setup.c | 10 -- > arch/riscv/kernel/smpboot.c | 2 +- > arch/riscv/kvm/mmu.c | 4 +- > arch/riscv/kvm/vcpu_sbi_replace.c | 11 +- > arch/riscv/kvm/vcpu_sbi_v01.c | 11 +- > arch/riscv/kvm/vmid.c | 4 +- > arch/riscv/mm/cacheflush.c | 5 +- > arch/riscv/mm/tlbflush.c | 9 +- > 21 files changed, 253 insertions(+), 180 deletions(-) > create mode 100644 arch/riscv/include/asm/cpu_ops_sbi.h Thanks, these are on for-next.