Message ID | 20230327163203.2918455-1-evan@rivosinc.com (mailing list archive) |
---|---|
Headers | show |
Series | RISC-V Hardware Probing User Interface | expand |
On Mon, Mar 27, 2023 at 09:31:57AM -0700, Evan Green wrote: Hey Evan, Patchwork has a rake of complaints about the series unfortunately: https://patchwork.kernel.org/project/linux-riscv/list/?series=734234 Some of the checkpatch whinging may be spurious, but there's some definitely valid stuff in there! > Evan Green (6): > RISC-V: Move struct riscv_cpuinfo to new header > RISC-V: Add a syscall for HW probing > RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA > RISC-V: hwprobe: Support probing of misaligned access performance > selftests: Test the new RISC-V hwprobe interface > RISC-V: Add hwprobe vDSO function and data And this one breaks the build for !MMU kernels unfortunately. Thanks, Conor.
Am Montag, 27. März 2023, 18:31:57 CEST schrieb Evan Green: > > There's been a bunch of off-list discussions about this, including at > Plumbers. The original plan was to do something involving providing an > ISA string to userspace, but ISA strings just aren't sufficient for a > stable ABI any more: in order to parse an ISA string users need the > version of the specifications that the string is written to, the version > of each extension (sometimes at a finer granularity than the RISC-V > releases/versions encode), and the expected use case for the ISA string > (ie, is it a U-mode or M-mode string). That's a lot of complexity to > try and keep ABI compatible and it's probably going to continue to grow, > as even if there's no more complexity in the specifications we'll have > to deal with the various ISA string parsing oddities that end up all > over userspace. > > Instead this patch set takes a very different approach and provides a set > of key/value pairs that encode various bits about the system. The big > advantage here is that we can clearly define what these mean so we can > ensure ABI stability, but it also allows us to encode information that's > unlikely to ever appear in an ISA string (see the misaligned access > performance, for example). The resulting interface looks a lot like > what arm64 and x86 do, and will hopefully fit well into something like > ACPI in the future. > > The actual user interface is a syscall, with a vDSO function in front of > it. The vDSO function can answer some queries without a syscall at all, > and falls back to the syscall for cases it doesn't have answers to. > Currently we prepopulate it with an array of answers for all keys and > a CPU set of "all CPUs". This can be adjusted as necessary to provide > fast answers to the most common queries. > > An example series in glibc exposing this syscall and using it in an > ifunc selector for memcpy can be found at [1]. I'm about to send a v2 > of that series out that incorporates the vDSO function. > > I was asked about the performance delta between this and something like > sysfs. I created a small test program [2] and ran it on a Nezha D1 > Allwinner board. Doing each operation 100000 times and dividing, these > operations take the following amount of time: > - open()+read()+close() of /sys/kernel/cpu_byteorder: 3.8us > - access("/sys/kernel/cpu_byteorder", R_OK): 1.3us > - riscv_hwprobe() vDSO and syscall: .0094us > - riscv_hwprobe() vDSO with no syscall: 0.0091us Looks like this series spawned a thread on one of the riscv-lists [0]. As auxvals were mentioned in that thread, I was wondering what's the difference between doing a new syscall vs. putting the keys + values as architecture auxvec elements [1] ? I'm probably missing some simple issue but from looking at that stuff I fathom RISCV_HWPROBE_KEY_BASE_BEHAVIOR could also just be AT_RISCV_BASE_BEHAVIOR ? Heiko [0] https://lists.riscv.org/g/sig-toolchains/topic/97886491 [1] https://elixir.bootlin.com/linux/latest/source/arch/riscv/include/uapi/asm/auxvec.h
On Tue, Mar 28, 2023 at 1:35 PM Heiko Stübner <heiko@sntech.de> wrote: > > Am Montag, 27. März 2023, 18:31:57 CEST schrieb Evan Green: > > > > There's been a bunch of off-list discussions about this, including at > > Plumbers. The original plan was to do something involving providing an > > ISA string to userspace, but ISA strings just aren't sufficient for a > > stable ABI any more: in order to parse an ISA string users need the > > version of the specifications that the string is written to, the version > > of each extension (sometimes at a finer granularity than the RISC-V > > releases/versions encode), and the expected use case for the ISA string > > (ie, is it a U-mode or M-mode string). That's a lot of complexity to > > try and keep ABI compatible and it's probably going to continue to grow, > > as even if there's no more complexity in the specifications we'll have > > to deal with the various ISA string parsing oddities that end up all > > over userspace. > > > > Instead this patch set takes a very different approach and provides a set > > of key/value pairs that encode various bits about the system. The big > > advantage here is that we can clearly define what these mean so we can > > ensure ABI stability, but it also allows us to encode information that's > > unlikely to ever appear in an ISA string (see the misaligned access > > performance, for example). The resulting interface looks a lot like > > what arm64 and x86 do, and will hopefully fit well into something like > > ACPI in the future. > > > > The actual user interface is a syscall, with a vDSO function in front of > > it. The vDSO function can answer some queries without a syscall at all, > > and falls back to the syscall for cases it doesn't have answers to. > > Currently we prepopulate it with an array of answers for all keys and > > a CPU set of "all CPUs". This can be adjusted as necessary to provide > > fast answers to the most common queries. > > > > An example series in glibc exposing this syscall and using it in an > > ifunc selector for memcpy can be found at [1]. I'm about to send a v2 > > of that series out that incorporates the vDSO function. > > > > I was asked about the performance delta between this and something like > > sysfs. I created a small test program [2] and ran it on a Nezha D1 > > Allwinner board. Doing each operation 100000 times and dividing, these > > operations take the following amount of time: > > - open()+read()+close() of /sys/kernel/cpu_byteorder: 3.8us > > - access("/sys/kernel/cpu_byteorder", R_OK): 1.3us > > - riscv_hwprobe() vDSO and syscall: .0094us > > - riscv_hwprobe() vDSO with no syscall: 0.0091us > > Looks like this series spawned a thread on one of the riscv-lists [0]. > > As auxvals were mentioned in that thread, I was wondering what's the > difference between doing a new syscall vs. putting the keys + values as > architecture auxvec elements [1] ? The auxvec approach would also work. The primary difference is that auxvec bits are actively copied into every new process, forever. If you predict a slow pace of new bits coming in, the auxvec approach probably makes more sense. This series was born out of a prediction that this set of "stuff" was going to be larger than traditional x86/ARM architectures, fiddly (ie bits possibly representing specific versions of various extensions), evolving regularly over time, and heterogeneous between cores. With that sort of rubber band ball in mind, a key/value interface seemed to make more sense. -Evan
On Mon, Mar 27, 2023 at 11:34 PM Conor Dooley <conor.dooley@microchip.com> wrote: > > On Mon, Mar 27, 2023 at 09:31:57AM -0700, Evan Green wrote: > > Hey Evan, > > Patchwork has a rake of complaints about the series unfortunately: > https://patchwork.kernel.org/project/linux-riscv/list/?series=734234 > > Some of the checkpatch whinging may be spurious, but there's some > definitely valid stuff in there! > > > Evan Green (6): > > RISC-V: Move struct riscv_cpuinfo to new header > > RISC-V: Add a syscall for HW probing > > RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA > > RISC-V: hwprobe: Support probing of misaligned access performance > > selftests: Test the new RISC-V hwprobe interface > > > RISC-V: Add hwprobe vDSO function and data > > And this one breaks the build for !MMU kernels unfortunately. Drat! Ok, thanks for the heads up. I'll go track these down. -Evan