Message ID | 20250207161939.46139-18-ajones@ventanamicro.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | riscv: Unaligned access speed probing fixes and skipping | expand |
Context | Check | Description |
---|---|---|
bjorn/pre-ci_am | success | Success |
bjorn/build-rv32-defconfig | success | build-rv32-defconfig |
bjorn/build-rv64-clang-allmodconfig | success | build-rv64-clang-allmodconfig |
bjorn/build-rv64-gcc-allmodconfig | success | build-rv64-gcc-allmodconfig |
bjorn/build-rv64-nommu-k210-defconfig | success | build-rv64-nommu-k210-defconfig |
bjorn/build-rv64-nommu-k210-virt | success | build-rv64-nommu-k210-virt |
bjorn/checkpatch | warning | checkpatch |
bjorn/dtb-warn-rv64 | success | dtb-warn-rv64 |
bjorn/header-inline | success | header-inline |
bjorn/kdoc | success | kdoc |
bjorn/module-param | success | module-param |
bjorn/verify-fixes | success | verify-fixes |
bjorn/verify-signedoff | success | verify-signedoff |
On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote: > Probing unaligned accesses on boot is time consuming. Provide a > function which will be used to look up the access type in a table > by id registers. Vendors which provide table entries can then skip > the probing. The access checker in my experience is only time consuming on slow hardware. Hardware that supports fast unaligned accesses isn't really impacted by this? Avoiding a list of hardware that has slow/fast unaligned accesses in the kernel was the main reason for dynamically checking. We did introduce the config option to compile the kernel with assumed slow/fast accesses, which of course has the downside of recompiling the kernel and I assume that you already considered that. Instead of having a table in the kernel, something that would be more platform agnostic would be to have an extension that signals this information. That seems like it would accomplish the same goal and leverage the existing infrastructure in the kernel, albeit with the need to make a new extension. - Charlie
On Fri, Feb 07, 2025 at 05:22:52PM -0800, Charlie Jenkins wrote: > On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote: > > Probing unaligned accesses on boot is time consuming. Provide a > > function which will be used to look up the access type in a table > > by id registers. Vendors which provide table entries can then skip > > the probing. > > The access checker in my experience is only time consuming on slow > hardware. Hardware that supports fast unaligned accesses isn't really > impacted by this? That's true, but... > Avoiding a list of hardware that has slow/fast > unaligned accesses in the kernel was the main reason for dynamically > checking. ...I'm not sure why we should try to avoid determining hardware support by its description when a description can be provided. > We did introduce the config option to compile the kernel with > assumed slow/fast accesses, which of course has the downside of > recompiling the kernel and I assume that you already considered that. yup > > Instead of having a table in the kernel, something that would be more > platform agnostic would be to have an extension that signals this > information. That seems like it would accomplish the same goal and > leverage the existing infrastructure in the kernel, albeit with the need > to make a new extension. Yes, I agree that another profile "named feature" may be the best approach. I'll consider proposing one, but [1] implies there may be some resistance to creating something like that. [1] https://github.com/riscv/riscv-isa-manual/issues/1611 Thanks, drew
On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote: > > On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote: > > Probing unaligned accesses on boot is time consuming. Provide a > > function which will be used to look up the access type in a table > > by id registers. Vendors which provide table entries can then skip > > the probing. > > The access checker in my experience is only time consuming on slow > hardware. Hardware that supports fast unaligned accesses isn't really > impacted by this? Avoiding a list of hardware that has slow/fast > unaligned accesses in the kernel was the main reason for dynamically > checking. We did introduce the config option to compile the kernel with > assumed slow/fast accesses, which of course has the downside of > recompiling the kernel and I assume that you already considered that. The kconfig option does not align with the vision of running the same kernel image across platforms. > > Instead of having a table in the kernel, something that would be more > platform agnostic would be to have an extension that signals this > information. That seems like it would accomplish the same goal and > leverage the existing infrastructure in the kernel, albeit with the need > to make a new extension. > IMO, expecting an ISA extension to be defined for all possible microarchitectural choices is not going to scale so it is better to have infrastructure in kernel itself to infer microarchitectural choices based on RISC-V implementation ID. Regards, Anup
On 10/02/2025 11:16, Anup Patel wrote: > On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote: >> >> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote: >>> Probing unaligned accesses on boot is time consuming. Provide a >>> function which will be used to look up the access type in a table >>> by id registers. Vendors which provide table entries can then skip >>> the probing. >> >> The access checker in my experience is only time consuming on slow >> hardware. Hardware that supports fast unaligned accesses isn't really >> impacted by this? Avoiding a list of hardware that has slow/fast >> unaligned accesses in the kernel was the main reason for dynamically >> checking. We did introduce the config option to compile the kernel with >> assumed slow/fast accesses, which of course has the downside of >> recompiling the kernel and I assume that you already considered that. > > The kconfig option does not align with the vision of running the same > kernel image across platforms. I'd would be advocating to remove compile time options as well and use another way to skip the probe (see below). > >> >> Instead of having a table in the kernel, something that would be more >> platform agnostic would be to have an extension that signals this >> information. That seems like it would accomplish the same goal and >> leverage the existing infrastructure in the kernel, albeit with the need >> to make a new extension. >> > > IMO, expecting an ISA extension to be defined for all possible > microarchitectural choices is not going to scale so it is better > to have infrastructure in kernel itself to infer microarchitectural > choices based on RISC-V implementation ID. Since adding an extension seems quite unlikely, and that a device-tree property is likely DT centric and not applicable to ACPI as well, was a command line argument considered ? Thanks, Clément > > Regards, > Anup > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote: > > > On 10/02/2025 11:16, Anup Patel wrote: > > On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote: > >> > >> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote: > >>> Probing unaligned accesses on boot is time consuming. Provide a > >>> function which will be used to look up the access type in a table > >>> by id registers. Vendors which provide table entries can then skip > >>> the probing. > >> > >> The access checker in my experience is only time consuming on slow > >> hardware. Hardware that supports fast unaligned accesses isn't really > >> impacted by this? Avoiding a list of hardware that has slow/fast > >> unaligned accesses in the kernel was the main reason for dynamically > >> checking. We did introduce the config option to compile the kernel with > >> assumed slow/fast accesses, which of course has the downside of > >> recompiling the kernel and I assume that you already considered that. > > > > The kconfig option does not align with the vision of running the same > > kernel image across platforms. > > I'd would be advocating to remove compile time options as well and use > another way to skip the probe (see below). > > > > >> > >> Instead of having a table in the kernel, something that would be more > >> platform agnostic would be to have an extension that signals this > >> information. That seems like it would accomplish the same goal and > >> leverage the existing infrastructure in the kernel, albeit with the need > >> to make a new extension. > >> > > > > IMO, expecting an ISA extension to be defined for all possible > > microarchitectural choices is not going to scale so it is better > > to have infrastructure in kernel itself to infer microarchitectural > > choices based on RISC-V implementation ID. > > Since adding an extension seems quite unlikely, and that a device-tree > property is likely DT centric and not applicable to ACPI as well, was a > command line argument considered ? > I did consider adding a command line option in addition to the table, allowing platforms which neither have a table entry [yet] nor want to do the speed test, to set whatever they like. In the end, I dropped it, since I don't have a use case at this time. However, if we really don't want a table, then I can look into the command line option instead. Thanks, drew
On 10/02/2025 15:06, Andrew Jones wrote: > On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote: >> >> >> On 10/02/2025 11:16, Anup Patel wrote: >>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote: >>>> >>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote: >>>>> Probing unaligned accesses on boot is time consuming. Provide a >>>>> function which will be used to look up the access type in a table >>>>> by id registers. Vendors which provide table entries can then skip >>>>> the probing. >>>> >>>> The access checker in my experience is only time consuming on slow >>>> hardware. Hardware that supports fast unaligned accesses isn't really >>>> impacted by this? Avoiding a list of hardware that has slow/fast >>>> unaligned accesses in the kernel was the main reason for dynamically >>>> checking. We did introduce the config option to compile the kernel with >>>> assumed slow/fast accesses, which of course has the downside of >>>> recompiling the kernel and I assume that you already considered that. >>> >>> The kconfig option does not align with the vision of running the same >>> kernel image across platforms. >> >> I'd would be advocating to remove compile time options as well and use >> another way to skip the probe (see below). >> >>> >>>> >>>> Instead of having a table in the kernel, something that would be more >>>> platform agnostic would be to have an extension that signals this >>>> information. That seems like it would accomplish the same goal and >>>> leverage the existing infrastructure in the kernel, albeit with the need >>>> to make a new extension. >>>> >>> >>> IMO, expecting an ISA extension to be defined for all possible >>> microarchitectural choices is not going to scale so it is better >>> to have infrastructure in kernel itself to infer microarchitectural >>> choices based on RISC-V implementation ID. >> >> Since adding an extension seems quite unlikely, and that a device-tree >> property is likely DT centric and not applicable to ACPI as well, was a >> command line argument considered ? >> > > I did consider adding a command line option in addition to the table, > allowing platforms which neither have a table entry [yet] nor want to do > the speed test, to set whatever they like. In the end, I dropped it, since > I don't have a use case at this time. However, if we really don't want a > table, then I can look into the command line option instead. Sorry if I wasn't clear, I wasn't considering this as a replacement for your table but rather as a replacement to Charlie's compile time define to skip misaligned speed probing since it is like "lpj=<x>". You can specify it on command line if you want to skip the loop time detection of loops per jiffies and have faster boot. Regarding your table, it feels like a bit going back to old hardcoded platform description ;). I think some kind of auto-detection of speed (not builtin the kernel) for platforms could be good as well to skip probing. A DT property also seems ok to me since the goal is to describe hardware. Would a common DT/ACPI property be appropriate ? The device_property API unified both so if we used some common property to describe the misaligned access speed (both in DT cpu node/ ACPI CPU device package), we could keep a single parsing method. But I'm no ACPI expert so I don't know if that really make sense. Thanks, Clément > > Thanks, > drew
On Mon, Feb 10, 2025 at 10:43:18AM +0100, Andrew Jones wrote: > On Fri, Feb 07, 2025 at 05:22:52PM -0800, Charlie Jenkins wrote: > > On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote: > > > Probing unaligned accesses on boot is time consuming. Provide a > > > function which will be used to look up the access type in a table > > > by id registers. Vendors which provide table entries can then skip > > > the probing. > > > > The access checker in my experience is only time consuming on slow > > hardware. Hardware that supports fast unaligned accesses isn't really > > impacted by this? > > That's true, but... > > > Avoiding a list of hardware that has slow/fast > > unaligned accesses in the kernel was the main reason for dynamically > > checking. > > ...I'm not sure why we should try to avoid determining hardware support > by its description when a description can be provided. I worry about scalability of this. This to me seems like a slippery slope of hardcoding performance tables into the kernel. There are a lot of riscv vendors and allowing anybody to add a table to the kernel to dynamically change behavior specifically for their hardware could become a maintainability nightmare. Avoiding this maintainability issue was the motivation for the runtime checker. > > > We did introduce the config option to compile the kernel with > > assumed slow/fast accesses, which of course has the downside of > > recompiling the kernel and I assume that you already considered that. > > yup > > > > > Instead of having a table in the kernel, something that would be more > > platform agnostic would be to have an extension that signals this > > information. That seems like it would accomplish the same goal and > > leverage the existing infrastructure in the kernel, albeit with the need > > to make a new extension. > > Yes, I agree that another profile "named feature" may be the best > approach. I'll consider proposing one, but [1] implies there may be Yeah that thread does highlight the unfortunate way that riscv has evolved, but I hope that wouldn't be a prohibiting factor here. - Charlie > some resistance to creating something like that. > > [1] https://github.com/riscv/riscv-isa-manual/issues/1611 > > Thanks, > drew
On Mon, Feb 10, 2025 at 03:46:46PM +0530, Anup Patel wrote: > On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote: > > > > On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote: > > > Probing unaligned accesses on boot is time consuming. Provide a > > > function which will be used to look up the access type in a table > > > by id registers. Vendors which provide table entries can then skip > > > the probing. > > > > The access checker in my experience is only time consuming on slow > > hardware. Hardware that supports fast unaligned accesses isn't really > > impacted by this? Avoiding a list of hardware that has slow/fast > > unaligned accesses in the kernel was the main reason for dynamically > > checking. We did introduce the config option to compile the kernel with > > assumed slow/fast accesses, which of course has the downside of > > recompiling the kernel and I assume that you already considered that. > > The kconfig option does not align with the vision of running the same > kernel image across platforms. I just don't think that vision is realistic. I am a proponent for compile time defines because ri ght now we are catering the kernel to both microcontrollers and for high performance platforms. I am in favor of having a set of configur ations that are ideal for these microcontrollers and a different set for high performance platforms. This is where the RVI profile s would ideally come in, having different configs for different profiles that target low performance/high performance. Compiler optimizations for extensions are not possib le to do by just having these different methods of selecting at runti me. By enabling extra extensions like the bitmanip extensions during compilation via a config flag we can optimize the entire kernel. It is not possible to push all optimizations off to runtime detection. > > > > > Instead of having a table in the kernel, something that would be more > > platform agnostic would be to have an extension that signals this > > information. That seems like it would accomplish the same goal and > > leverage the existing infrastructure in the kernel, albeit with the need > > to make a new extension. > > > > IMO, expecting an ISA extension to be defined for all possible > microarchitectural choices is not going to scale so it is better > to have infrastructure in kernel itself to infer microarchitectural > choices based on RISC-V implementation ID. How is keeping tables in the kernel for all microarchitectural details any more scalable than having extensions that do the same thing? I would argue that having it in the kernel is less scalable since it needs to be described for all implementation IDs, and all changes require going through the kernel review process. Dynamic probing avoids these issues. Having an extension has the one-time process of getting the extension into something like a profile, but then anybody could use it without needing a kernel patch. - Charlie > > Regards, > Anup
On Mon, Feb 10, 2025 at 03:20:34PM +0100, Clément Léger wrote: > > > On 10/02/2025 15:06, Andrew Jones wrote: > > On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote: > >> > >> > >> On 10/02/2025 11:16, Anup Patel wrote: > >>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote: > >>>> > >>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote: > >>>>> Probing unaligned accesses on boot is time consuming. Provide a > >>>>> function which will be used to look up the access type in a table > >>>>> by id registers. Vendors which provide table entries can then skip > >>>>> the probing. > >>>> > >>>> The access checker in my experience is only time consuming on slow > >>>> hardware. Hardware that supports fast unaligned accesses isn't really > >>>> impacted by this? Avoiding a list of hardware that has slow/fast > >>>> unaligned accesses in the kernel was the main reason for dynamically > >>>> checking. We did introduce the config option to compile the kernel with > >>>> assumed slow/fast accesses, which of course has the downside of > >>>> recompiling the kernel and I assume that you already considered that. > >>> > >>> The kconfig option does not align with the vision of running the same > >>> kernel image across platforms. > >> > >> I'd would be advocating to remove compile time options as well and use > >> another way to skip the probe (see below). > >> > >>> > >>>> > >>>> Instead of having a table in the kernel, something that would be more > >>>> platform agnostic would be to have an extension that signals this > >>>> information. That seems like it would accomplish the same goal and > >>>> leverage the existing infrastructure in the kernel, albeit with the need > >>>> to make a new extension. > >>>> > >>> > >>> IMO, expecting an ISA extension to be defined for all possible > >>> microarchitectural choices is not going to scale so it is better > >>> to have infrastructure in kernel itself to infer microarchitectural > >>> choices based on RISC-V implementation ID. > >> > >> Since adding an extension seems quite unlikely, and that a device-tree > >> property is likely DT centric and not applicable to ACPI as well, was a > >> command line argument considered ? > >> > > > > I did consider adding a command line option in addition to the table, > > allowing platforms which neither have a table entry [yet] nor want to do > > the speed test, to set whatever they like. In the end, I dropped it, since > > I don't have a use case at this time. However, if we really don't want a > > table, then I can look into the command line option instead. > > Sorry if I wasn't clear, I wasn't considering this as a replacement for > your table but rather as a replacement to Charlie's compile time define > to skip misaligned speed probing since it is like "lpj=<x>". You can > specify it on command line if you want to skip the loop time detection > of loops per jiffies and have faster boot. Jesse sent out a patch for a kernel parameter to set the access speed to whatever is desired [1]. - Charlie > -} > -#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */ > -static void __init check_unaligned_access_speed_all_cpus(void) > -{ > -} > -#endif > - > #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS > static void check_vector_unaligned_access(struct work_struct *work __always_unused) > { > @@ -370,6 +380,11 @@ static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway > } > #endif > > +static bool check_vector_unaligned_access_table(void) > +{ > + return false; > +} > + > static int riscv_online_cpu_vec(unsigned int cpu) > { > if (!has_vector()) { > @@ -377,6 +392,9 @@ static int riscv_online_cpu_vec(unsigned int cpu) > return 0; > } > > + if (check_vector_unaligned_access_table()) > + return 0; > + > #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS > if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN) > return 0; > @@ -392,13 +410,15 @@ static int __init check_unaligned_access_all_cpus(void) > { > int cpu; > > - if (!check_unaligned_access_emulated_all_cpus()) > + if (!check_unaligned_access_table() && > + !check_unaligned_access_emulated_all_cpus()) > check_unaligned_access_speed_all_cpus(); > > if (!has_vector()) { > for_each_online_cpu(cpu) > per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED; > - } else if (!check_vector_unaligned_access_emulated_all_cpus() && > + } else if (!check_vector_unaligned_access_table() && > + !check_vector_unaligned_access_emulated_all_cpus() && > IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) { > kthread_run(vec_check_unaligned_access_speed_all_cpus, > NULL, "vec_check_unaligned_access_speed_all_cpus"); > > Regarding your table, it feels like a bit going back to old hardcoded > platform description ;). I think some kind of auto-detection of speed > (not builtin the kernel) for platforms could be good as well to skip > probing. > > A DT property also seems ok to me since the goal is to describe > hardware. Would a common DT/ACPI property be appropriate ? The > device_property API unified both so if we used some common property to > describe the misaligned access speed (both in DT cpu node/ ACPI CPU > device package), we could keep a single parsing method. But I'm no ACPI > expert so I don't know if that really make sense. > > Thanks, > > Clément > > > > > Thanks, > > drew >
On 10/02/2025 18:19, Charlie Jenkins wrote: > On Mon, Feb 10, 2025 at 03:46:46PM +0530, Anup Patel wrote: >> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote: >>> >>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote: >>>> Probing unaligned accesses on boot is time consuming. Provide a >>>> function which will be used to look up the access type in a table >>>> by id registers. Vendors which provide table entries can then skip >>>> the probing. >>> >>> The access checker in my experience is only time consuming on slow >>> hardware. Hardware that supports fast unaligned accesses isn't really >>> impacted by this? Avoiding a list of hardware that has slow/fast >>> unaligned accesses in the kernel was the main reason for dynamically >>> checking. We did introduce the config option to compile the kernel with >>> assumed slow/fast accesses, which of course has the downside of >>> recompiling the kernel and I assume that you already considered that. >> >> The kconfig option does not align with the vision of running the same >> kernel image across platforms. > > I just don't think that vision is realistic. > > I am a proponent for compile time defines because ri ght now we are > catering the kernel to both microcontrollers and for high performance > platforms. I am in favor of having a set of configur ations that are > ideal for these microcontrollers and a different set for high > performance platforms. This is where the RVI profile s would ideally > come in, having different configs for different profiles that target low > performance/high performance. > > Compiler optimizations for extensions are not possib le to do by just > having these different methods of selecting at runti me. By enabling > extra extensions like the bitmanip extensions during compilation via a > config flag we can optimize the entire kernel. It is not possible to > push all optimizations off to runtime detection. While this might be true for the bitmanip extension and other extensions that the compiler can take advantage of, that isn't true for the misaligned speed probing code. The only meaningful misaligned access configuration option for kernel "speed" optimization is RISCV_EFFICIENT_UNALIGNED_ACCESS (which is ironically not easily selectable since it is under NON_PORTABLE). Currently all the config options that have been added around misaligned support "just" allows to get rid of the probing code at compile time and set a predefined speed. That does not really improve the kernel performance itself, just allows for a faster boot. That could as well be supported using a command line option as suggested. That being said, I agree that some additional configuration options should be added to enable additional extension support at compile time, enabling a faster kernel. Having a single image for all hardware is as you said not realistic but profiles configuration as you proposed might be the key for this support. Clément > >> >>> >>> Instead of having a table in the kernel, something that would be more >>> platform agnostic would be to have an extension that signals this >>> information. That seems like it would accomplish the same goal and >>> leverage the existing infrastructure in the kernel, albeit with the need >>> to make a new extension. >>> >> >> IMO, expecting an ISA extension to be defined for all possible >> microarchitectural choices is not going to scale so it is better >> to have infrastructure in kernel itself to infer microarchitectural >> choices based on RISC-V implementation ID. > > How is keeping tables in the kernel for all microarchitectural details > any more scalable than having extensions that do the same thing? I would > argue that having it in the kernel is less scalable since it needs to be > described for all implementation IDs, and all changes require going > through the kernel review process. Dynamic probing avoids these issues. > Having an extension has the one-time process of getting the extension > into something like a profile, but then anybody could use it without > needing a kernel patch. > > - Charlie > >> >> Regards, >> Anup > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On 10/02/2025 18:20, Charlie Jenkins wrote: > On Mon, Feb 10, 2025 at 03:20:34PM +0100, Clément Léger wrote: >> >> >> On 10/02/2025 15:06, Andrew Jones wrote: >>> On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote: >>>> >>>> >>>> On 10/02/2025 11:16, Anup Patel wrote: >>>>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote: >>>>>> >>>>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote: >>>>>>> Probing unaligned accesses on boot is time consuming. Provide a >>>>>>> function which will be used to look up the access type in a table >>>>>>> by id registers. Vendors which provide table entries can then skip >>>>>>> the probing. >>>>>> >>>>>> The access checker in my experience is only time consuming on slow >>>>>> hardware. Hardware that supports fast unaligned accesses isn't really >>>>>> impacted by this? Avoiding a list of hardware that has slow/fast >>>>>> unaligned accesses in the kernel was the main reason for dynamically >>>>>> checking. We did introduce the config option to compile the kernel with >>>>>> assumed slow/fast accesses, which of course has the downside of >>>>>> recompiling the kernel and I assume that you already considered that. >>>>> >>>>> The kconfig option does not align with the vision of running the same >>>>> kernel image across platforms. >>>> >>>> I'd would be advocating to remove compile time options as well and use >>>> another way to skip the probe (see below). >>>> >>>>> >>>>>> >>>>>> Instead of having a table in the kernel, something that would be more >>>>>> platform agnostic would be to have an extension that signals this >>>>>> information. That seems like it would accomplish the same goal and >>>>>> leverage the existing infrastructure in the kernel, albeit with the need >>>>>> to make a new extension. >>>>>> >>>>> >>>>> IMO, expecting an ISA extension to be defined for all possible >>>>> microarchitectural choices is not going to scale so it is better >>>>> to have infrastructure in kernel itself to infer microarchitectural >>>>> choices based on RISC-V implementation ID. >>>> >>>> Since adding an extension seems quite unlikely, and that a device-tree >>>> property is likely DT centric and not applicable to ACPI as well, was a >>>> command line argument considered ? >>>> >>> >>> I did consider adding a command line option in addition to the table, >>> allowing platforms which neither have a table entry [yet] nor want to do >>> the speed test, to set whatever they like. In the end, I dropped it, since >>> I don't have a use case at this time. However, if we really don't want a >>> table, then I can look into the command line option instead. >> >> Sorry if I wasn't clear, I wasn't considering this as a replacement for >> your table but rather as a replacement to Charlie's compile time define >> to skip misaligned speed probing since it is like "lpj=<x>". You can >> specify it on command line if you want to skip the loop time detection >> of loops per jiffies and have faster boot. > > Jesse sent out a patch for a kernel parameter to set the access speed to > whatever is desired [1]. Hey Charlie, Thanks but it seems you forgot to add the link ? Having configuration option + command line option seems like something particularly heavy for such feature. The ifdefery/config options involved in the misaligned probing code is already quite complicated. If another mean to specify the misaligned speed access is added, I think all configuration options to set the speed of accesses can then be removed and just keep the command line. That will certainly simplify the ifdef/config options. Clément > > - Charlie > >> -} >> -#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */ >> -static void __init check_unaligned_access_speed_all_cpus(void) >> -{ >> -} >> -#endif >> - >> #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS >> static void check_vector_unaligned_access(struct work_struct *work __always_unused) >> { >> @@ -370,6 +380,11 @@ static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway >> } >> #endif >> >> +static bool check_vector_unaligned_access_table(void) >> +{ >> + return false; >> +} >> + >> static int riscv_online_cpu_vec(unsigned int cpu) >> { >> if (!has_vector()) { >> @@ -377,6 +392,9 @@ static int riscv_online_cpu_vec(unsigned int cpu) >> return 0; >> } >> >> + if (check_vector_unaligned_access_table()) >> + return 0; >> + >> #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS >> if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN) >> return 0; >> @@ -392,13 +410,15 @@ static int __init check_unaligned_access_all_cpus(void) >> { >> int cpu; >> >> - if (!check_unaligned_access_emulated_all_cpus()) >> + if (!check_unaligned_access_table() && >> + !check_unaligned_access_emulated_all_cpus()) >> check_unaligned_access_speed_all_cpus(); >> >> if (!has_vector()) { >> for_each_online_cpu(cpu) >> per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED; >> - } else if (!check_vector_unaligned_access_emulated_all_cpus() && >> + } else if (!check_vector_unaligned_access_table() && >> + !check_vector_unaligned_access_emulated_all_cpus() && >> IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) { >> kthread_run(vec_check_unaligned_access_speed_all_cpus, >> NULL, "vec_check_unaligned_access_speed_all_cpus"); > >> >> Regarding your table, it feels like a bit going back to old hardcoded >> platform description ;). I think some kind of auto-detection of speed >> (not builtin the kernel) for platforms could be good as well to skip >> probing. >> >> A DT property also seems ok to me since the goal is to describe >> hardware. Would a common DT/ACPI property be appropriate ? The >> device_property API unified both so if we used some common property to >> describe the misaligned access speed (both in DT cpu node/ ACPI CPU >> device package), we could keep a single parsing method. But I'm no ACPI >> expert so I don't know if that really make sense. >> >> Thanks, >> >> Clément >> >>> >>> Thanks, >>> drew >>
On Mon, Feb 10, 2025 at 09:42:25PM +0100, Clément Léger wrote: > > > On 10/02/2025 18:20, Charlie Jenkins wrote: > > On Mon, Feb 10, 2025 at 03:20:34PM +0100, Clément Léger wrote: > >> > >> > >> On 10/02/2025 15:06, Andrew Jones wrote: > >>> On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote: > >>>> > >>>> > >>>> On 10/02/2025 11:16, Anup Patel wrote: > >>>>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote: > >>>>>> > >>>>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote: > >>>>>>> Probing unaligned accesses on boot is time consuming. Provide a > >>>>>>> function which will be used to look up the access type in a table > >>>>>>> by id registers. Vendors which provide table entries can then skip > >>>>>>> the probing. > >>>>>> > >>>>>> The access checker in my experience is only time consuming on slow > >>>>>> hardware. Hardware that supports fast unaligned accesses isn't really > >>>>>> impacted by this? Avoiding a list of hardware that has slow/fast > >>>>>> unaligned accesses in the kernel was the main reason for dynamically > >>>>>> checking. We did introduce the config option to compile the kernel with > >>>>>> assumed slow/fast accesses, which of course has the downside of > >>>>>> recompiling the kernel and I assume that you already considered that. > >>>>> > >>>>> The kconfig option does not align with the vision of running the same > >>>>> kernel image across platforms. > >>>> > >>>> I'd would be advocating to remove compile time options as well and use > >>>> another way to skip the probe (see below). > >>>> > >>>>> > >>>>>> > >>>>>> Instead of having a table in the kernel, something that would be more > >>>>>> platform agnostic would be to have an extension that signals this > >>>>>> information. That seems like it would accomplish the same goal and > >>>>>> leverage the existing infrastructure in the kernel, albeit with the need > >>>>>> to make a new extension. > >>>>>> > >>>>> > >>>>> IMO, expecting an ISA extension to be defined for all possible > >>>>> microarchitectural choices is not going to scale so it is better > >>>>> to have infrastructure in kernel itself to infer microarchitectural > >>>>> choices based on RISC-V implementation ID. > >>>> > >>>> Since adding an extension seems quite unlikely, and that a device-tree > >>>> property is likely DT centric and not applicable to ACPI as well, was a > >>>> command line argument considered ? > >>>> > >>> > >>> I did consider adding a command line option in addition to the table, > >>> allowing platforms which neither have a table entry [yet] nor want to do > >>> the speed test, to set whatever they like. In the end, I dropped it, since > >>> I don't have a use case at this time. However, if we really don't want a > >>> table, then I can look into the command line option instead. > >> > >> Sorry if I wasn't clear, I wasn't considering this as a replacement for > >> your table but rather as a replacement to Charlie's compile time define > >> to skip misaligned speed probing since it is like "lpj=<x>". You can > >> specify it on command line if you want to skip the loop time detection > >> of loops per jiffies and have faster boot. > > > > Jesse sent out a patch for a kernel parameter to set the access speed to > > whatever is desired [1]. > > Hey Charlie, > > Thanks but it seems you forgot to add the link ? Oops, I frequently do that... https://lore.kernel.org/linux-riscv/20240805173816.3722002-1-jesse@rivosinc.com/ > > Having configuration option + command line option seems like something > particularly heavy for such feature. The ifdefery/config options > involved in the misaligned probing code is already quite complicated. If > another mean to specify the misaligned speed access is added, I think > all configuration options to set the speed of accesses can then be > removed and just keep the command line. That will certainly simplify the > ifdef/config options. Yeah that's why it didn't get merged because it felt like overkill. I responded on the thread to Anup as why I would prefer config options. It just comes down to config options being required to enable compiler features. The kernel is only built with rv64gc and usage of all other extensions requires hand written assembly. There are easy performance gains when compiling the kernel with rv64gc_zba_zbb_zbkb etc. Performance focused kernels will need to be recompiled anyway so I am of the opinion that grouping in other performance features as config options like this is the easiest thing to do and reduces the amount of code in the kernel. - Charlie > > Clément > > > > > - Charlie > > > >> -} > >> -#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */ > >> -static void __init check_unaligned_access_speed_all_cpus(void) > >> -{ > >> -} > >> -#endif > >> - > >> #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS > >> static void check_vector_unaligned_access(struct work_struct *work __always_unused) > >> { > >> @@ -370,6 +380,11 @@ static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway > >> } > >> #endif > >> > >> +static bool check_vector_unaligned_access_table(void) > >> +{ > >> + return false; > >> +} > >> + > >> static int riscv_online_cpu_vec(unsigned int cpu) > >> { > >> if (!has_vector()) { > >> @@ -377,6 +392,9 @@ static int riscv_online_cpu_vec(unsigned int cpu) > >> return 0; > >> } > >> > >> + if (check_vector_unaligned_access_table()) > >> + return 0; > >> + > >> #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS > >> if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN) > >> return 0; > >> @@ -392,13 +410,15 @@ static int __init check_unaligned_access_all_cpus(void) > >> { > >> int cpu; > >> > >> - if (!check_unaligned_access_emulated_all_cpus()) > >> + if (!check_unaligned_access_table() && > >> + !check_unaligned_access_emulated_all_cpus()) > >> check_unaligned_access_speed_all_cpus(); > >> > >> if (!has_vector()) { > >> for_each_online_cpu(cpu) > >> per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED; > >> - } else if (!check_vector_unaligned_access_emulated_all_cpus() && > >> + } else if (!check_vector_unaligned_access_table() && > >> + !check_vector_unaligned_access_emulated_all_cpus() && > >> IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) { > >> kthread_run(vec_check_unaligned_access_speed_all_cpus, > >> NULL, "vec_check_unaligned_access_speed_all_cpus"); > > > >> > >> Regarding your table, it feels like a bit going back to old hardcoded > >> platform description ;). I think some kind of auto-detection of speed > >> (not builtin the kernel) for platforms could be good as well to skip > >> probing. > >> > >> A DT property also seems ok to me since the goal is to describe > >> hardware. Would a common DT/ACPI property be appropriate ? The > >> device_property API unified both so if we used some common property to > >> describe the misaligned access speed (both in DT cpu node/ ACPI CPU > >> device package), we could keep a single parsing method. But I'm no ACPI > >> expert so I don't know if that really make sense. > >> > >> Thanks, > >> > >> Clément > >> > >>> > >>> Thanks, > >>> drew > >> >
On 10/02/2025 21:53, Charlie Jenkins wrote: > On Mon, Feb 10, 2025 at 09:42:25PM +0100, Clément Léger wrote: >> >> >> On 10/02/2025 18:20, Charlie Jenkins wrote: >>> On Mon, Feb 10, 2025 at 03:20:34PM +0100, Clément Léger wrote: >>>> >>>> >>>> On 10/02/2025 15:06, Andrew Jones wrote: >>>>> On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote: >>>>>> >>>>>> >>>>>> On 10/02/2025 11:16, Anup Patel wrote: >>>>>>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote: >>>>>>>> >>>>>>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote: >>>>>>>>> Probing unaligned accesses on boot is time consuming. Provide a >>>>>>>>> function which will be used to look up the access type in a table >>>>>>>>> by id registers. Vendors which provide table entries can then skip >>>>>>>>> the probing. >>>>>>>> >>>>>>>> The access checker in my experience is only time consuming on slow >>>>>>>> hardware. Hardware that supports fast unaligned accesses isn't really >>>>>>>> impacted by this? Avoiding a list of hardware that has slow/fast >>>>>>>> unaligned accesses in the kernel was the main reason for dynamically >>>>>>>> checking. We did introduce the config option to compile the kernel with >>>>>>>> assumed slow/fast accesses, which of course has the downside of >>>>>>>> recompiling the kernel and I assume that you already considered that. >>>>>>> >>>>>>> The kconfig option does not align with the vision of running the same >>>>>>> kernel image across platforms. >>>>>> >>>>>> I'd would be advocating to remove compile time options as well and use >>>>>> another way to skip the probe (see below). >>>>>> >>>>>>> >>>>>>>> >>>>>>>> Instead of having a table in the kernel, something that would be more >>>>>>>> platform agnostic would be to have an extension that signals this >>>>>>>> information. That seems like it would accomplish the same goal and >>>>>>>> leverage the existing infrastructure in the kernel, albeit with the need >>>>>>>> to make a new extension. >>>>>>>> >>>>>>> >>>>>>> IMO, expecting an ISA extension to be defined for all possible >>>>>>> microarchitectural choices is not going to scale so it is better >>>>>>> to have infrastructure in kernel itself to infer microarchitectural >>>>>>> choices based on RISC-V implementation ID. >>>>>> >>>>>> Since adding an extension seems quite unlikely, and that a device-tree >>>>>> property is likely DT centric and not applicable to ACPI as well, was a >>>>>> command line argument considered ? >>>>>> >>>>> >>>>> I did consider adding a command line option in addition to the table, >>>>> allowing platforms which neither have a table entry [yet] nor want to do >>>>> the speed test, to set whatever they like. In the end, I dropped it, since >>>>> I don't have a use case at this time. However, if we really don't want a >>>>> table, then I can look into the command line option instead. >>>> >>>> Sorry if I wasn't clear, I wasn't considering this as a replacement for >>>> your table but rather as a replacement to Charlie's compile time define >>>> to skip misaligned speed probing since it is like "lpj=<x>". You can >>>> specify it on command line if you want to skip the loop time detection >>>> of loops per jiffies and have faster boot. >>> >>> Jesse sent out a patch for a kernel parameter to set the access speed to >>> whatever is desired [1]. >> >> Hey Charlie, >> >> Thanks but it seems you forgot to add the link ? > > Oops, I frequently do that... > > https://lore.kernel.org/linux-riscv/20240805173816.3722002-1-jesse@rivosinc.com/ > >> >> Having configuration option + command line option seems like something >> particularly heavy for such feature. The ifdefery/config options >> involved in the misaligned probing code is already quite complicated. If >> another mean to specify the misaligned speed access is added, I think >> all configuration options to set the speed of accesses can then be >> removed and just keep the command line. That will certainly simplify the >> ifdef/config options. > > Yeah that's why it didn't get merged because it felt like overkill. I > responded on the thread to Anup as why I would prefer config options. It > just comes down to config options being required to enable compiler > features. The kernel is only built with rv64gc and usage of all other > extensions requires hand written assembly. There are easy performance > gains when compiling the kernel with rv64gc_zba_zbb_zbkb etc. > Performance focused kernels will need to be recompiled anyway so I am of > the opinion that grouping in other performance features as config > options like this is the easiest thing to do and reduces the amount of > code in the kernel. As answered on the other thread, totally agree, except for the misaligned accesses probing config options ;). Ultimately, we need profiles configuration, either via defconfigs that enables a bunch of optimization via ISA extension or configuration options that groups these config options. Clément > > - Charlie > >> >> Clément >> >>> >>> - Charlie >>> >>>> -} >>>> -#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */ >>>> -static void __init check_unaligned_access_speed_all_cpus(void) >>>> -{ >>>> -} >>>> -#endif >>>> - >>>> #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS >>>> static void check_vector_unaligned_access(struct work_struct *work __always_unused) >>>> { >>>> @@ -370,6 +380,11 @@ static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway >>>> } >>>> #endif >>>> >>>> +static bool check_vector_unaligned_access_table(void) >>>> +{ >>>> + return false; >>>> +} >>>> + >>>> static int riscv_online_cpu_vec(unsigned int cpu) >>>> { >>>> if (!has_vector()) { >>>> @@ -377,6 +392,9 @@ static int riscv_online_cpu_vec(unsigned int cpu) >>>> return 0; >>>> } >>>> >>>> + if (check_vector_unaligned_access_table()) >>>> + return 0; >>>> + >>>> #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS >>>> if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN) >>>> return 0; >>>> @@ -392,13 +410,15 @@ static int __init check_unaligned_access_all_cpus(void) >>>> { >>>> int cpu; >>>> >>>> - if (!check_unaligned_access_emulated_all_cpus()) >>>> + if (!check_unaligned_access_table() && >>>> + !check_unaligned_access_emulated_all_cpus()) >>>> check_unaligned_access_speed_all_cpus(); >>>> >>>> if (!has_vector()) { >>>> for_each_online_cpu(cpu) >>>> per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED; >>>> - } else if (!check_vector_unaligned_access_emulated_all_cpus() && >>>> + } else if (!check_vector_unaligned_access_table() && >>>> + !check_vector_unaligned_access_emulated_all_cpus() && >>>> IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) { >>>> kthread_run(vec_check_unaligned_access_speed_all_cpus, >>>> NULL, "vec_check_unaligned_access_speed_all_cpus"); >>> >>>> >>>> Regarding your table, it feels like a bit going back to old hardcoded >>>> platform description ;). I think some kind of auto-detection of speed >>>> (not builtin the kernel) for platforms could be good as well to skip >>>> probing. >>>> >>>> A DT property also seems ok to me since the goal is to describe >>>> hardware. Would a common DT/ACPI property be appropriate ? The >>>> device_property API unified both so if we used some common property to >>>> describe the misaligned access speed (both in DT cpu node/ ACPI CPU >>>> device package), we could keep a single parsing method. But I'm no ACPI >>>> expert so I don't know if that really make sense. >>>> >>>> Thanks, >>>> >>>> Clément >>>> >>>>> >>>>> Thanks, >>>>> drew >>>> >>
On Mon, Feb 10, 2025 at 09:57:26PM +0100, Clément Léger wrote: > > > On 10/02/2025 21:53, Charlie Jenkins wrote: > > On Mon, Feb 10, 2025 at 09:42:25PM +0100, Clément Léger wrote: > >> > >> > >> On 10/02/2025 18:20, Charlie Jenkins wrote: > >>> On Mon, Feb 10, 2025 at 03:20:34PM +0100, Clément Léger wrote: > >>>> > >>>> > >>>> On 10/02/2025 15:06, Andrew Jones wrote: > >>>>> On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote: > >>>>>> > >>>>>> > >>>>>> On 10/02/2025 11:16, Anup Patel wrote: > >>>>>>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote: > >>>>>>>> > >>>>>>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote: > >>>>>>>>> Probing unaligned accesses on boot is time consuming. Provide a > >>>>>>>>> function which will be used to look up the access type in a table > >>>>>>>>> by id registers. Vendors which provide table entries can then skip > >>>>>>>>> the probing. > >>>>>>>> > >>>>>>>> The access checker in my experience is only time consuming on slow > >>>>>>>> hardware. Hardware that supports fast unaligned accesses isn't really > >>>>>>>> impacted by this? Avoiding a list of hardware that has slow/fast > >>>>>>>> unaligned accesses in the kernel was the main reason for dynamically > >>>>>>>> checking. We did introduce the config option to compile the kernel with > >>>>>>>> assumed slow/fast accesses, which of course has the downside of > >>>>>>>> recompiling the kernel and I assume that you already considered that. > >>>>>>> > >>>>>>> The kconfig option does not align with the vision of running the same > >>>>>>> kernel image across platforms. > >>>>>> > >>>>>> I'd would be advocating to remove compile time options as well and use > >>>>>> another way to skip the probe (see below). > >>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>> Instead of having a table in the kernel, something that would be more > >>>>>>>> platform agnostic would be to have an extension that signals this > >>>>>>>> information. That seems like it would accomplish the same goal and > >>>>>>>> leverage the existing infrastructure in the kernel, albeit with the need > >>>>>>>> to make a new extension. > >>>>>>>> > >>>>>>> > >>>>>>> IMO, expecting an ISA extension to be defined for all possible > >>>>>>> microarchitectural choices is not going to scale so it is better > >>>>>>> to have infrastructure in kernel itself to infer microarchitectural > >>>>>>> choices based on RISC-V implementation ID. > >>>>>> > >>>>>> Since adding an extension seems quite unlikely, and that a device-tree > >>>>>> property is likely DT centric and not applicable to ACPI as well, was a > >>>>>> command line argument considered ? > >>>>>> > >>>>> > >>>>> I did consider adding a command line option in addition to the table, > >>>>> allowing platforms which neither have a table entry [yet] nor want to do > >>>>> the speed test, to set whatever they like. In the end, I dropped it, since > >>>>> I don't have a use case at this time. However, if we really don't want a > >>>>> table, then I can look into the command line option instead. > >>>> > >>>> Sorry if I wasn't clear, I wasn't considering this as a replacement for > >>>> your table but rather as a replacement to Charlie's compile time define > >>>> to skip misaligned speed probing since it is like "lpj=<x>". You can > >>>> specify it on command line if you want to skip the loop time detection > >>>> of loops per jiffies and have faster boot. > >>> > >>> Jesse sent out a patch for a kernel parameter to set the access speed to > >>> whatever is desired [1]. > >> > >> Hey Charlie, > >> > >> Thanks but it seems you forgot to add the link ? > > > > Oops, I frequently do that... > > > > https://lore.kernel.org/linux-riscv/20240805173816.3722002-1-jesse@rivosinc.com/ > > > >> > >> Having configuration option + command line option seems like something > >> particularly heavy for such feature. The ifdefery/config options > >> involved in the misaligned probing code is already quite complicated. If > >> another mean to specify the misaligned speed access is added, I think > >> all configuration options to set the speed of accesses can then be > >> removed and just keep the command line. That will certainly simplify the > >> ifdef/config options. > > > > Yeah that's why it didn't get merged because it felt like overkill. I > > responded on the thread to Anup as why I would prefer config options. It > > just comes down to config options being required to enable compiler > > features. The kernel is only built with rv64gc and usage of all other > > extensions requires hand written assembly. There are easy performance > > gains when compiling the kernel with rv64gc_zba_zbb_zbkb etc. > > Performance focused kernels will need to be recompiled anyway so I am of > > the opinion that grouping in other performance features as config > > options like this is the easiest thing to do and reduces the amount of > > code in the kernel. > > As answered on the other thread, totally agree, except for the > misaligned accesses probing config options ;). Oh! I have missed that response, where is that? > Ultimately, we need > profiles configuration, either via defconfigs that enables a bunch of > optimization via ISA extension or configuration options that groups > these config options. Why do you agree with profile configs for other things but not for misaligned access probing? > > Clément > > > > > - Charlie > > > >> > >> Clément > >> > >>> > >>> - Charlie > >>> > >>>> -} > >>>> -#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */ > >>>> -static void __init check_unaligned_access_speed_all_cpus(void) > >>>> -{ > >>>> -} > >>>> -#endif > >>>> - > >>>> #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS > >>>> static void check_vector_unaligned_access(struct work_struct *work __always_unused) > >>>> { > >>>> @@ -370,6 +380,11 @@ static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway > >>>> } > >>>> #endif > >>>> > >>>> +static bool check_vector_unaligned_access_table(void) > >>>> +{ > >>>> + return false; > >>>> +} > >>>> + > >>>> static int riscv_online_cpu_vec(unsigned int cpu) > >>>> { > >>>> if (!has_vector()) { > >>>> @@ -377,6 +392,9 @@ static int riscv_online_cpu_vec(unsigned int cpu) > >>>> return 0; > >>>> } > >>>> > >>>> + if (check_vector_unaligned_access_table()) > >>>> + return 0; > >>>> + > >>>> #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS > >>>> if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN) > >>>> return 0; > >>>> @@ -392,13 +410,15 @@ static int __init check_unaligned_access_all_cpus(void) > >>>> { > >>>> int cpu; > >>>> > >>>> - if (!check_unaligned_access_emulated_all_cpus()) > >>>> + if (!check_unaligned_access_table() && > >>>> + !check_unaligned_access_emulated_all_cpus()) > >>>> check_unaligned_access_speed_all_cpus(); > >>>> > >>>> if (!has_vector()) { > >>>> for_each_online_cpu(cpu) > >>>> per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED; > >>>> - } else if (!check_vector_unaligned_access_emulated_all_cpus() && > >>>> + } else if (!check_vector_unaligned_access_table() && > >>>> + !check_vector_unaligned_access_emulated_all_cpus() && > >>>> IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) { > >>>> kthread_run(vec_check_unaligned_access_speed_all_cpus, > >>>> NULL, "vec_check_unaligned_access_speed_all_cpus"); > >>> > >>>> > >>>> Regarding your table, it feels like a bit going back to old hardcoded > >>>> platform description ;). I think some kind of auto-detection of speed > >>>> (not builtin the kernel) for platforms could be good as well to skip > >>>> probing. > >>>> > >>>> A DT property also seems ok to me since the goal is to describe > >>>> hardware. Would a common DT/ACPI property be appropriate ? The > >>>> device_property API unified both so if we used some common property to > >>>> describe the misaligned access speed (both in DT cpu node/ ACPI CPU > >>>> device package), we could keep a single parsing method. But I'm no ACPI > >>>> expert so I don't know if that really make sense. > >>>> > >>>> Thanks, > >>>> > >>>> Clément > >>>> > >>>>> > >>>>> Thanks, > >>>>> drew > >>>> > >> >
On Tue, Feb 11, 2025 at 2:27 AM Clément Léger <cleger@rivosinc.com> wrote: > > > > On 10/02/2025 21:53, Charlie Jenkins wrote: > > On Mon, Feb 10, 2025 at 09:42:25PM +0100, Clément Léger wrote: > >> > >> > >> On 10/02/2025 18:20, Charlie Jenkins wrote: > >>> On Mon, Feb 10, 2025 at 03:20:34PM +0100, Clément Léger wrote: > >>>> > >>>> > >>>> On 10/02/2025 15:06, Andrew Jones wrote: > >>>>> On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote: > >>>>>> > >>>>>> > >>>>>> On 10/02/2025 11:16, Anup Patel wrote: > >>>>>>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote: > >>>>>>>> > >>>>>>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote: > >>>>>>>>> Probing unaligned accesses on boot is time consuming. Provide a > >>>>>>>>> function which will be used to look up the access type in a table > >>>>>>>>> by id registers. Vendors which provide table entries can then skip > >>>>>>>>> the probing. > >>>>>>>> > >>>>>>>> The access checker in my experience is only time consuming on slow > >>>>>>>> hardware. Hardware that supports fast unaligned accesses isn't really > >>>>>>>> impacted by this? Avoiding a list of hardware that has slow/fast > >>>>>>>> unaligned accesses in the kernel was the main reason for dynamically > >>>>>>>> checking. We did introduce the config option to compile the kernel with > >>>>>>>> assumed slow/fast accesses, which of course has the downside of > >>>>>>>> recompiling the kernel and I assume that you already considered that. > >>>>>>> > >>>>>>> The kconfig option does not align with the vision of running the same > >>>>>>> kernel image across platforms. > >>>>>> > >>>>>> I'd would be advocating to remove compile time options as well and use > >>>>>> another way to skip the probe (see below). > >>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>> Instead of having a table in the kernel, something that would be more > >>>>>>>> platform agnostic would be to have an extension that signals this > >>>>>>>> information. That seems like it would accomplish the same goal and > >>>>>>>> leverage the existing infrastructure in the kernel, albeit with the need > >>>>>>>> to make a new extension. > >>>>>>>> > >>>>>>> > >>>>>>> IMO, expecting an ISA extension to be defined for all possible > >>>>>>> microarchitectural choices is not going to scale so it is better > >>>>>>> to have infrastructure in kernel itself to infer microarchitectural > >>>>>>> choices based on RISC-V implementation ID. > >>>>>> > >>>>>> Since adding an extension seems quite unlikely, and that a device-tree > >>>>>> property is likely DT centric and not applicable to ACPI as well, was a > >>>>>> command line argument considered ? > >>>>>> > >>>>> > >>>>> I did consider adding a command line option in addition to the table, > >>>>> allowing platforms which neither have a table entry [yet] nor want to do > >>>>> the speed test, to set whatever they like. In the end, I dropped it, since > >>>>> I don't have a use case at this time. However, if we really don't want a > >>>>> table, then I can look into the command line option instead. > >>>> > >>>> Sorry if I wasn't clear, I wasn't considering this as a replacement for > >>>> your table but rather as a replacement to Charlie's compile time define > >>>> to skip misaligned speed probing since it is like "lpj=<x>". You can > >>>> specify it on command line if you want to skip the loop time detection > >>>> of loops per jiffies and have faster boot. > >>> > >>> Jesse sent out a patch for a kernel parameter to set the access speed to > >>> whatever is desired [1]. > >> > >> Hey Charlie, > >> > >> Thanks but it seems you forgot to add the link ? > > > > Oops, I frequently do that... > > > > https://lore.kernel.org/linux-riscv/20240805173816.3722002-1-jesse@rivosinc.com/ > > > >> > >> Having configuration option + command line option seems like something > >> particularly heavy for such feature. The ifdefery/config options > >> involved in the misaligned probing code is already quite complicated. If > >> another mean to specify the misaligned speed access is added, I think > >> all configuration options to set the speed of accesses can then be > >> removed and just keep the command line. That will certainly simplify the > >> ifdef/config options. > > > > Yeah that's why it didn't get merged because it felt like overkill. I > > responded on the thread to Anup as why I would prefer config options. It > > just comes down to config options being required to enable compiler > > features. The kernel is only built with rv64gc and usage of all other > > extensions requires hand written assembly. There are easy performance > > gains when compiling the kernel with rv64gc_zba_zbb_zbkb etc. > > Performance focused kernels will need to be recompiled anyway so I am of > > the opinion that grouping in other performance features as config > > options like this is the easiest thing to do and reduces the amount of > > code in the kernel. > > As answered on the other thread, totally agree, except for the > misaligned accesses probing config options ;). Ultimately, we need > profiles configuration, either via defconfigs that enables a bunch of > optimization via ISA extension or configuration options that groups > these config options. I agree. Kconfig options for ISA extensions are fine but not for misaligned access. The kernel command-line option to skip misaligned access probe is fine as well. I still think RISC-V implementation ID based microarchitecture feature discovery will be eventually required as well so why not add it now. For profiles, how about having an incremental rva23.config similar to 64-bit.config ? This will minimize the duplication by re-using the same base defconfig. Regards, Anup > > Clément > > > > > - Charlie > > > >> > >> Clément > >> > >>> > >>> - Charlie > >>> > >>>> -} > >>>> -#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */ > >>>> -static void __init check_unaligned_access_speed_all_cpus(void) > >>>> -{ > >>>> -} > >>>> -#endif > >>>> - > >>>> #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS > >>>> static void check_vector_unaligned_access(struct work_struct *work __always_unused) > >>>> { > >>>> @@ -370,6 +380,11 @@ static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway > >>>> } > >>>> #endif > >>>> > >>>> +static bool check_vector_unaligned_access_table(void) > >>>> +{ > >>>> + return false; > >>>> +} > >>>> + > >>>> static int riscv_online_cpu_vec(unsigned int cpu) > >>>> { > >>>> if (!has_vector()) { > >>>> @@ -377,6 +392,9 @@ static int riscv_online_cpu_vec(unsigned int cpu) > >>>> return 0; > >>>> } > >>>> > >>>> + if (check_vector_unaligned_access_table()) > >>>> + return 0; > >>>> + > >>>> #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS > >>>> if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN) > >>>> return 0; > >>>> @@ -392,13 +410,15 @@ static int __init check_unaligned_access_all_cpus(void) > >>>> { > >>>> int cpu; > >>>> > >>>> - if (!check_unaligned_access_emulated_all_cpus()) > >>>> + if (!check_unaligned_access_table() && > >>>> + !check_unaligned_access_emulated_all_cpus()) > >>>> check_unaligned_access_speed_all_cpus(); > >>>> > >>>> if (!has_vector()) { > >>>> for_each_online_cpu(cpu) > >>>> per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED; > >>>> - } else if (!check_vector_unaligned_access_emulated_all_cpus() && > >>>> + } else if (!check_vector_unaligned_access_table() && > >>>> + !check_vector_unaligned_access_emulated_all_cpus() && > >>>> IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) { > >>>> kthread_run(vec_check_unaligned_access_speed_all_cpus, > >>>> NULL, "vec_check_unaligned_access_speed_all_cpus"); > >>> > >>>> > >>>> Regarding your table, it feels like a bit going back to old hardcoded > >>>> platform description ;). I think some kind of auto-detection of speed > >>>> (not builtin the kernel) for platforms could be good as well to skip > >>>> probing. > >>>> > >>>> A DT property also seems ok to me since the goal is to describe > >>>> hardware. Would a common DT/ACPI property be appropriate ? The > >>>> device_property API unified both so if we used some common property to > >>>> describe the misaligned access speed (both in DT cpu node/ ACPI CPU > >>>> device package), we could keep a single parsing method. But I'm no ACPI > >>>> expert so I don't know if that really make sense. > >>>> > >>>> Thanks, > >>>> > >>>> Clément > >>>> > >>>>> > >>>>> Thanks, > >>>>> drew > >>>> > >> >
On 11/02/2025 05:26, Anup Patel wrote: > On Tue, Feb 11, 2025 at 2:27 AM Clément Léger <cleger@rivosinc.com> wrote: >> >> >> >> On 10/02/2025 21:53, Charlie Jenkins wrote: >>> On Mon, Feb 10, 2025 at 09:42:25PM +0100, Clément Léger wrote: >>>> >>>> >>>> On 10/02/2025 18:20, Charlie Jenkins wrote: >>>>> On Mon, Feb 10, 2025 at 03:20:34PM +0100, Clément Léger wrote: >>>>>> >>>>>> >>>>>> On 10/02/2025 15:06, Andrew Jones wrote: >>>>>>> On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 10/02/2025 11:16, Anup Patel wrote: >>>>>>>>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote: >>>>>>>>>> >>>>>>>>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote: >>>>>>>>>>> Probing unaligned accesses on boot is time consuming. Provide a >>>>>>>>>>> function which will be used to look up the access type in a table >>>>>>>>>>> by id registers. Vendors which provide table entries can then skip >>>>>>>>>>> the probing. >>>>>>>>>> >>>>>>>>>> The access checker in my experience is only time consuming on slow >>>>>>>>>> hardware. Hardware that supports fast unaligned accesses isn't really >>>>>>>>>> impacted by this? Avoiding a list of hardware that has slow/fast >>>>>>>>>> unaligned accesses in the kernel was the main reason for dynamically >>>>>>>>>> checking. We did introduce the config option to compile the kernel with >>>>>>>>>> assumed slow/fast accesses, which of course has the downside of >>>>>>>>>> recompiling the kernel and I assume that you already considered that. >>>>>>>>> >>>>>>>>> The kconfig option does not align with the vision of running the same >>>>>>>>> kernel image across platforms. >>>>>>>> >>>>>>>> I'd would be advocating to remove compile time options as well and use >>>>>>>> another way to skip the probe (see below). >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Instead of having a table in the kernel, something that would be more >>>>>>>>>> platform agnostic would be to have an extension that signals this >>>>>>>>>> information. That seems like it would accomplish the same goal and >>>>>>>>>> leverage the existing infrastructure in the kernel, albeit with the need >>>>>>>>>> to make a new extension. >>>>>>>>>> >>>>>>>>> >>>>>>>>> IMO, expecting an ISA extension to be defined for all possible >>>>>>>>> microarchitectural choices is not going to scale so it is better >>>>>>>>> to have infrastructure in kernel itself to infer microarchitectural >>>>>>>>> choices based on RISC-V implementation ID. >>>>>>>> >>>>>>>> Since adding an extension seems quite unlikely, and that a device-tree >>>>>>>> property is likely DT centric and not applicable to ACPI as well, was a >>>>>>>> command line argument considered ? >>>>>>>> >>>>>>> >>>>>>> I did consider adding a command line option in addition to the table, >>>>>>> allowing platforms which neither have a table entry [yet] nor want to do >>>>>>> the speed test, to set whatever they like. In the end, I dropped it, since >>>>>>> I don't have a use case at this time. However, if we really don't want a >>>>>>> table, then I can look into the command line option instead. >>>>>> >>>>>> Sorry if I wasn't clear, I wasn't considering this as a replacement for >>>>>> your table but rather as a replacement to Charlie's compile time define >>>>>> to skip misaligned speed probing since it is like "lpj=<x>". You can >>>>>> specify it on command line if you want to skip the loop time detection >>>>>> of loops per jiffies and have faster boot. >>>>> >>>>> Jesse sent out a patch for a kernel parameter to set the access speed to >>>>> whatever is desired [1]. >>>> >>>> Hey Charlie, >>>> >>>> Thanks but it seems you forgot to add the link ? >>> >>> Oops, I frequently do that... >>> >>> https://lore.kernel.org/linux-riscv/20240805173816.3722002-1-jesse@rivosinc.com/ >>> >>>> >>>> Having configuration option + command line option seems like something >>>> particularly heavy for such feature. The ifdefery/config options >>>> involved in the misaligned probing code is already quite complicated. If >>>> another mean to specify the misaligned speed access is added, I think >>>> all configuration options to set the speed of accesses can then be >>>> removed and just keep the command line. That will certainly simplify the >>>> ifdef/config options. >>> >>> Yeah that's why it didn't get merged because it felt like overkill. I >>> responded on the thread to Anup as why I would prefer config options. It >>> just comes down to config options being required to enable compiler >>> features. The kernel is only built with rv64gc and usage of all other >>> extensions requires hand written assembly. There are easy performance >>> gains when compiling the kernel with rv64gc_zba_zbb_zbkb etc. >>> Performance focused kernels will need to be recompiled anyway so I am of >>> the opinion that grouping in other performance features as config >>> options like this is the easiest thing to do and reduces the amount of >>> code in the kernel. >> >> As answered on the other thread, totally agree, except for the >> misaligned accesses probing config options ;). Ultimately, we need >> profiles configuration, either via defconfigs that enables a bunch of >> optimization via ISA extension or configuration options that groups >> these config options. > > I agree. > > Kconfig options for ISA extensions are fine but not for > misaligned access. The kernel command-line option to skip > misaligned access probe is fine as well. I still think RISC-V > implementation ID based microarchitecture feature discovery > will be eventually required as well so why not add it now. As said by Charlie, keeping an internal list of IDs that needs to be updated/submitted to add new ones is quite a burden to maintain. IMHO, this would be better to be put in ACPI or DT description since it purely describe hardware behavior. But you said it will be "eventually required" so you might have some specific use-case in mind ? > > For profiles, how about having an incremental rva23.config > similar to 64-bit.config ? This will minimize the duplication > by re-using the same base defconfig. .config or a CONFIG_PROFILE option that allow to select the profile for to be supported (CONFIG_PROFILE_RVA23). Using profile config options might be better though since we can have better dependencies/select between profiles rather than duplicating them in various profile .config. So I might be in favor of a CONFIG_PROFILE rather than a .config per profile. BTW, that's not totally exclusive, if we want to add a rva23.config that contains CONFIG_PROFILE_RVA23=y, that works as well, although a bit overkill ;). Regards, Clément > > Regards, > Anup > > > >> >> Clément >> >>> >>> - Charlie >>> >>>> >>>> Clément >>>> >>>>> >>>>> - Charlie >>>>> >>>>>> -} >>>>>> -#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */ >>>>>> -static void __init check_unaligned_access_speed_all_cpus(void) >>>>>> -{ >>>>>> -} >>>>>> -#endif >>>>>> - >>>>>> #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS >>>>>> static void check_vector_unaligned_access(struct work_struct *work __always_unused) >>>>>> { >>>>>> @@ -370,6 +380,11 @@ static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway >>>>>> } >>>>>> #endif >>>>>> >>>>>> +static bool check_vector_unaligned_access_table(void) >>>>>> +{ >>>>>> + return false; >>>>>> +} >>>>>> + >>>>>> static int riscv_online_cpu_vec(unsigned int cpu) >>>>>> { >>>>>> if (!has_vector()) { >>>>>> @@ -377,6 +392,9 @@ static int riscv_online_cpu_vec(unsigned int cpu) >>>>>> return 0; >>>>>> } >>>>>> >>>>>> + if (check_vector_unaligned_access_table()) >>>>>> + return 0; >>>>>> + >>>>>> #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS >>>>>> if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN) >>>>>> return 0; >>>>>> @@ -392,13 +410,15 @@ static int __init check_unaligned_access_all_cpus(void) >>>>>> { >>>>>> int cpu; >>>>>> >>>>>> - if (!check_unaligned_access_emulated_all_cpus()) >>>>>> + if (!check_unaligned_access_table() && >>>>>> + !check_unaligned_access_emulated_all_cpus()) >>>>>> check_unaligned_access_speed_all_cpus(); >>>>>> >>>>>> if (!has_vector()) { >>>>>> for_each_online_cpu(cpu) >>>>>> per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED; >>>>>> - } else if (!check_vector_unaligned_access_emulated_all_cpus() && >>>>>> + } else if (!check_vector_unaligned_access_table() && >>>>>> + !check_vector_unaligned_access_emulated_all_cpus() && >>>>>> IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) { >>>>>> kthread_run(vec_check_unaligned_access_speed_all_cpus, >>>>>> NULL, "vec_check_unaligned_access_speed_all_cpus"); >>>>> >>>>>> >>>>>> Regarding your table, it feels like a bit going back to old hardcoded >>>>>> platform description ;). I think some kind of auto-detection of speed >>>>>> (not builtin the kernel) for platforms could be good as well to skip >>>>>> probing. >>>>>> >>>>>> A DT property also seems ok to me since the goal is to describe >>>>>> hardware. Would a common DT/ACPI property be appropriate ? The >>>>>> device_property API unified both so if we used some common property to >>>>>> describe the misaligned access speed (both in DT cpu node/ ACPI CPU >>>>>> device package), we could keep a single parsing method. But I'm no ACPI >>>>>> expert so I don't know if that really make sense. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Clément >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> drew >>>>>> >>>> >>
On Mon, Feb 10, 2025 at 09:37:10PM +0100, Clément Léger wrote: > > > On 10/02/2025 18:19, Charlie Jenkins wrote: > > On Mon, Feb 10, 2025 at 03:46:46PM +0530, Anup Patel wrote: > >> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote: > >>> > >>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote: > >>>> Probing unaligned accesses on boot is time consuming. Provide a > >>>> function which will be used to look up the access type in a table > >>>> by id registers. Vendors which provide table entries can then skip > >>>> the probing. > >>> > >>> The access checker in my experience is only time consuming on slow > >>> hardware. Hardware that supports fast unaligned accesses isn't really > >>> impacted by this? Avoiding a list of hardware that has slow/fast > >>> unaligned accesses in the kernel was the main reason for dynamically > >>> checking. We did introduce the config option to compile the kernel with > >>> assumed slow/fast accesses, which of course has the downside of > >>> recompiling the kernel and I assume that you already considered that. > >> > >> The kconfig option does not align with the vision of running the same > >> kernel image across platforms. > > > > I just don't think that vision is realistic. > > > > I am a proponent for compile time defines because ri ght now we are > > catering the kernel to both microcontrollers and for high performance > > platforms. I am in favor of having a set of configur ations that are > > ideal for these microcontrollers and a different set for high > > performance platforms. This is where the RVI profile s would ideally > > come in, having different configs for different profiles that target low > > performance/high performance. > > > > Compiler optimizations for extensions are not possib le to do by just > > having these different methods of selecting at runti me. By enabling > > extra extensions like the bitmanip extensions during compilation via a > > config flag we can optimize the entire kernel. It is not possible to > > push all optimizations off to runtime detection. > > While this might be true for the bitmanip extension and other extensions > that the compiler can take advantage of, that isn't true for the > misaligned speed probing code. The only meaningful misaligned access > configuration option for kernel "speed" optimization is > RISCV_EFFICIENT_UNALIGNED_ACCESS (which is ironically not easily > selectable since it is under NON_PORTABLE). > > Currently all the config options that have been added around misaligned > support "just" allows to get rid of the probing code at compile time and > set a predefined speed. That does not really improve the kernel > performance itself, just allows for a faster boot. That could as well be > supported using a command line option as suggested. I agree. I'll look into stripping config options and picking up Jesse's command line option for v2. I'll also drop the table approach of this series since I don't have a strong enough justification for it over the command line at this time. Thanks, drew > > That being said, I agree that some additional configuration options > should be added to enable additional extension support at compile time, > enabling a faster kernel. Having a single image for all hardware is as > you said not realistic but profiles configuration as you proposed might > be the key for this support. > > Clément > > > > >> > >>> > >>> Instead of having a table in the kernel, something that would be more > >>> platform agnostic would be to have an extension that signals this > >>> information. That seems like it would accomplish the same goal and > >>> leverage the existing infrastructure in the kernel, albeit with the need > >>> to make a new extension. > >>> > >> > >> IMO, expecting an ISA extension to be defined for all possible > >> microarchitectural choices is not going to scale so it is better > >> to have infrastructure in kernel itself to infer microarchitectural > >> choices based on RISC-V implementation ID. > > > > How is keeping tables in the kernel for all microarchitectural details > > any more scalable than having extensions that do the same thing? I would > > argue that having it in the kernel is less scalable since it needs to be > > described for all implementation IDs, and all changes require going > > through the kernel review process. Dynamic probing avoids these issues. > > Having an extension has the one-time process of getting the extension > > into something like a profile, but then anybody could use it without > > needing a kernel patch. > > > > - Charlie > > > >> > >> Regards, > >> Anup > > > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv >
On Mon, 10 Feb 2025 20:26:32 PST (-0800), apatel@ventanamicro.com wrote: > On Tue, Feb 11, 2025 at 2:27 AM Clément Léger <cleger@rivosinc.com> wrote: >> >> >> >> On 10/02/2025 21:53, Charlie Jenkins wrote: >> > On Mon, Feb 10, 2025 at 09:42:25PM +0100, Clément Léger wrote: >> >> >> >> >> >> On 10/02/2025 18:20, Charlie Jenkins wrote: >> >>> On Mon, Feb 10, 2025 at 03:20:34PM +0100, Clément Léger wrote: >> >>>> >> >>>> >> >>>> On 10/02/2025 15:06, Andrew Jones wrote: >> >>>>> On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote: >> >>>>>> >> >>>>>> >> >>>>>> On 10/02/2025 11:16, Anup Patel wrote: >> >>>>>>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote: >> >>>>>>>> >> >>>>>>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote: >> >>>>>>>>> Probing unaligned accesses on boot is time consuming. Provide a >> >>>>>>>>> function which will be used to look up the access type in a table >> >>>>>>>>> by id registers. Vendors which provide table entries can then skip >> >>>>>>>>> the probing. >> >>>>>>>> >> >>>>>>>> The access checker in my experience is only time consuming on slow >> >>>>>>>> hardware. Hardware that supports fast unaligned accesses isn't really >> >>>>>>>> impacted by this? Avoiding a list of hardware that has slow/fast >> >>>>>>>> unaligned accesses in the kernel was the main reason for dynamically >> >>>>>>>> checking. We did introduce the config option to compile the kernel with >> >>>>>>>> assumed slow/fast accesses, which of course has the downside of >> >>>>>>>> recompiling the kernel and I assume that you already considered that. >> >>>>>>> >> >>>>>>> The kconfig option does not align with the vision of running the same >> >>>>>>> kernel image across platforms. >> >>>>>> >> >>>>>> I'd would be advocating to remove compile time options as well and use >> >>>>>> another way to skip the probe (see below). >> >>>>>> >> >>>>>>> >> >>>>>>>> >> >>>>>>>> Instead of having a table in the kernel, something that would be more >> >>>>>>>> platform agnostic would be to have an extension that signals this >> >>>>>>>> information. That seems like it would accomplish the same goal and >> >>>>>>>> leverage the existing infrastructure in the kernel, albeit with the need >> >>>>>>>> to make a new extension. >> >>>>>>>> >> >>>>>>> >> >>>>>>> IMO, expecting an ISA extension to be defined for all possible >> >>>>>>> microarchitectural choices is not going to scale so it is better >> >>>>>>> to have infrastructure in kernel itself to infer microarchitectural >> >>>>>>> choices based on RISC-V implementation ID. >> >>>>>> >> >>>>>> Since adding an extension seems quite unlikely, and that a device-tree >> >>>>>> property is likely DT centric and not applicable to ACPI as well, was a >> >>>>>> command line argument considered ? >> >>>>>> >> >>>>> >> >>>>> I did consider adding a command line option in addition to the table, >> >>>>> allowing platforms which neither have a table entry [yet] nor want to do >> >>>>> the speed test, to set whatever they like. In the end, I dropped it, since >> >>>>> I don't have a use case at this time. However, if we really don't want a >> >>>>> table, then I can look into the command line option instead. >> >>>> >> >>>> Sorry if I wasn't clear, I wasn't considering this as a replacement for >> >>>> your table but rather as a replacement to Charlie's compile time define >> >>>> to skip misaligned speed probing since it is like "lpj=<x>". You can >> >>>> specify it on command line if you want to skip the loop time detection >> >>>> of loops per jiffies and have faster boot. >> >>> >> >>> Jesse sent out a patch for a kernel parameter to set the access speed to >> >>> whatever is desired [1]. >> >> >> >> Hey Charlie, >> >> >> >> Thanks but it seems you forgot to add the link ? >> > >> > Oops, I frequently do that... >> > >> > https://lore.kernel.org/linux-riscv/20240805173816.3722002-1-jesse@rivosinc.com/ >> > >> >> >> >> Having configuration option + command line option seems like something >> >> particularly heavy for such feature. The ifdefery/config options >> >> involved in the misaligned probing code is already quite complicated. If >> >> another mean to specify the misaligned speed access is added, I think >> >> all configuration options to set the speed of accesses can then be >> >> removed and just keep the command line. That will certainly simplify the >> >> ifdef/config options. >> > >> > Yeah that's why it didn't get merged because it felt like overkill. I >> > responded on the thread to Anup as why I would prefer config options. It >> > just comes down to config options being required to enable compiler >> > features. The kernel is only built with rv64gc and usage of all other >> > extensions requires hand written assembly. There are easy performance >> > gains when compiling the kernel with rv64gc_zba_zbb_zbkb etc. >> > Performance focused kernels will need to be recompiled anyway so I am of >> > the opinion that grouping in other performance features as config >> > options like this is the easiest thing to do and reduces the amount of >> > code in the kernel. >> >> As answered on the other thread, totally agree, except for the >> misaligned accesses probing config options ;). Ultimately, we need >> profiles configuration, either via defconfigs that enables a bunch of >> optimization via ISA extension or configuration options that groups >> these config options. > > I agree. > > Kconfig options for ISA extensions are fine but not for > misaligned access. The kernel command-line option to skip > misaligned access probe is fine as well. IMO it's reasonable to have a Kconfig option that does something like "build a kernel that assumes misaligned accesses are fast and supported". We've got a bunch of other platform-level opt ins along those lines (drivers, extensions, etrrata, memory stuff, etc). It would almost certainly be a measurable performance boost to enable misaligned accesses when building Linux, it is for every other big codebase. > I still think RISC-V > implementation ID based microarchitecture feature discovery > will be eventually required as well so why not add it now. Ya, the probing loops are nasty. We originally tried adding this to the DT so we didn't have to probe for it, but that got shot down. IMO it's way cleaner than having tables of vendor IDs in the kernel, maybe if that's the backup option it'll convince the DT people to change their minds? > For profiles, how about having an incremental rva23.config > similar to 64-bit.config ? This will minimize the duplication > by re-using the same base defconfig. The profiles don't really tell us anything, though. They're really just marketing material, vendors just ignore the actually requirements and stamp whatever they ship as compliant. If we start adding configs we should just go with some sort of "support what shipped in 2024" type definitions, that'd let us Kconfig-off some cruft for supporting old systems while still being able to actually target something concrete. > > Regards, > Anup > > > >> >> Clément >> >> > >> > - Charlie >> > >> >> >> >> Clément >> >> >> >>> >> >>> - Charlie >> >>> >> >>>> -} >> >>>> -#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */ >> >>>> -static void __init check_unaligned_access_speed_all_cpus(void) >> >>>> -{ >> >>>> -} >> >>>> -#endif >> >>>> - >> >>>> #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS >> >>>> static void check_vector_unaligned_access(struct work_struct *work __always_unused) >> >>>> { >> >>>> @@ -370,6 +380,11 @@ static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway >> >>>> } >> >>>> #endif >> >>>> >> >>>> +static bool check_vector_unaligned_access_table(void) >> >>>> +{ >> >>>> + return false; >> >>>> +} >> >>>> + >> >>>> static int riscv_online_cpu_vec(unsigned int cpu) >> >>>> { >> >>>> if (!has_vector()) { >> >>>> @@ -377,6 +392,9 @@ static int riscv_online_cpu_vec(unsigned int cpu) >> >>>> return 0; >> >>>> } >> >>>> >> >>>> + if (check_vector_unaligned_access_table()) >> >>>> + return 0; >> >>>> + >> >>>> #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS >> >>>> if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN) >> >>>> return 0; >> >>>> @@ -392,13 +410,15 @@ static int __init check_unaligned_access_all_cpus(void) >> >>>> { >> >>>> int cpu; >> >>>> >> >>>> - if (!check_unaligned_access_emulated_all_cpus()) >> >>>> + if (!check_unaligned_access_table() && >> >>>> + !check_unaligned_access_emulated_all_cpus()) >> >>>> check_unaligned_access_speed_all_cpus(); >> >>>> >> >>>> if (!has_vector()) { >> >>>> for_each_online_cpu(cpu) >> >>>> per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED; >> >>>> - } else if (!check_vector_unaligned_access_emulated_all_cpus() && >> >>>> + } else if (!check_vector_unaligned_access_table() && >> >>>> + !check_vector_unaligned_access_emulated_all_cpus() && >> >>>> IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) { >> >>>> kthread_run(vec_check_unaligned_access_speed_all_cpus, >> >>>> NULL, "vec_check_unaligned_access_speed_all_cpus"); >> >>> >> >>>> >> >>>> Regarding your table, it feels like a bit going back to old hardcoded >> >>>> platform description ;). I think some kind of auto-detection of speed >> >>>> (not builtin the kernel) for platforms could be good as well to skip >> >>>> probing. >> >>>> >> >>>> A DT property also seems ok to me since the goal is to describe >> >>>> hardware. Would a common DT/ACPI property be appropriate ? The >> >>>> device_property API unified both so if we used some common property to >> >>>> describe the misaligned access speed (both in DT cpu node/ ACPI CPU >> >>>> device package), we could keep a single parsing method. But I'm no ACPI >> >>>> expert so I don't know if that really make sense. >> >>>> >> >>>> Thanks, >> >>>> >> >>>> Clément >> >>>> >> >>>>> >> >>>>> Thanks, >> >>>>> drew >> >>>> >> >> >>
diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c index d9d4ca1fadc7..f8497097e79d 100644 --- a/arch/riscv/kernel/unaligned_access_speed.c +++ b/arch/riscv/kernel/unaligned_access_speed.c @@ -130,6 +130,50 @@ static void __init check_unaligned_access_nonboot_cpu(void *param) check_unaligned_access(pages[cpu]); } +/* Measure unaligned access speed on all CPUs present at boot in parallel. */ +static void __init check_unaligned_access_speed_all_cpus(void) +{ + unsigned int cpu; + unsigned int cpu_count = num_possible_cpus(); + struct page **bufs = kcalloc(cpu_count, sizeof(*bufs), GFP_KERNEL); + + if (!bufs) { + pr_warn("Allocation failure, not measuring misaligned performance\n"); + return; + } + + /* + * Allocate separate buffers for each CPU so there's no fighting over + * cache lines. + */ + for_each_cpu(cpu, cpu_online_mask) { + bufs[cpu] = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER); + if (!bufs[cpu]) { + pr_warn("Allocation failure, not measuring misaligned performance\n"); + goto out; + } + } + + /* Check everybody except 0, who stays behind to tend jiffies. */ + on_each_cpu(check_unaligned_access_nonboot_cpu, bufs, 1); + + /* Check core 0. */ + smp_call_on_cpu(0, check_unaligned_access, bufs[0], true); + +out: + for_each_cpu(cpu, cpu_online_mask) { + if (bufs[cpu]) + __free_pages(bufs[cpu], MISALIGNED_BUFFER_ORDER); + } + + kfree(bufs); +} +#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */ +static void __init check_unaligned_access_speed_all_cpus(void) +{ +} +#endif + DEFINE_STATIC_KEY_FALSE(fast_unaligned_access_speed_key); static void modify_unaligned_access_branches(cpumask_t *mask, int weight) @@ -186,8 +230,17 @@ static int __init lock_and_set_unaligned_access_static_branch(void) arch_initcall_sync(lock_and_set_unaligned_access_static_branch); +static bool check_unaligned_access_table(void) +{ + return false; +} + static int riscv_online_cpu(unsigned int cpu) { + if (check_unaligned_access_table()) + goto exit; + +#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS static struct page *buf; /* We are already set since the last check */ @@ -203,6 +256,7 @@ static int riscv_online_cpu(unsigned int cpu) check_unaligned_access(buf); __free_pages(buf, MISALIGNED_BUFFER_ORDER); +#endif exit: set_unaligned_access_static_branches(); @@ -217,50 +271,6 @@ static int riscv_offline_cpu(unsigned int cpu) return 0; } -/* Measure unaligned access speed on all CPUs present at boot in parallel. */ -static void __init check_unaligned_access_speed_all_cpus(void) -{ - unsigned int cpu; - unsigned int cpu_count = num_possible_cpus(); - struct page **bufs = kcalloc(cpu_count, sizeof(*bufs), GFP_KERNEL); - - if (!bufs) { - pr_warn("Allocation failure, not measuring misaligned performance\n"); - return; - } - - /* - * Allocate separate buffers for each CPU so there's no fighting over - * cache lines. - */ - for_each_cpu(cpu, cpu_online_mask) { - bufs[cpu] = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER); - if (!bufs[cpu]) { - pr_warn("Allocation failure, not measuring misaligned performance\n"); - goto out; - } - } - - /* Check everybody except 0, who stays behind to tend jiffies. */ - on_each_cpu(check_unaligned_access_nonboot_cpu, bufs, 1); - - /* Check core 0. */ - smp_call_on_cpu(0, check_unaligned_access, bufs[0], true); - -out: - for_each_cpu(cpu, cpu_online_mask) { - if (bufs[cpu]) - __free_pages(bufs[cpu], MISALIGNED_BUFFER_ORDER); - } - - kfree(bufs); -} -#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */ -static void __init check_unaligned_access_speed_all_cpus(void) -{ -} -#endif - #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS static void check_vector_unaligned_access(struct work_struct *work __always_unused) { @@ -370,6 +380,11 @@ static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway } #endif +static bool check_vector_unaligned_access_table(void) +{ + return false; +} + static int riscv_online_cpu_vec(unsigned int cpu) { if (!has_vector()) { @@ -377,6 +392,9 @@ static int riscv_online_cpu_vec(unsigned int cpu) return 0; } + if (check_vector_unaligned_access_table()) + return 0; + #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN) return 0; @@ -392,13 +410,15 @@ static int __init check_unaligned_access_all_cpus(void) { int cpu; - if (!check_unaligned_access_emulated_all_cpus()) + if (!check_unaligned_access_table() && + !check_unaligned_access_emulated_all_cpus()) check_unaligned_access_speed_all_cpus(); if (!has_vector()) { for_each_online_cpu(cpu) per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED; - } else if (!check_vector_unaligned_access_emulated_all_cpus() && + } else if (!check_vector_unaligned_access_table() && + !check_vector_unaligned_access_emulated_all_cpus() && IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) { kthread_run(vec_check_unaligned_access_speed_all_cpus, NULL, "vec_check_unaligned_access_speed_all_cpus"); @@ -408,10 +428,8 @@ static int __init check_unaligned_access_all_cpus(void) * Setup hotplug callbacks for any new CPUs that come online or go * offline. */ -#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "riscv:online", riscv_online_cpu, riscv_offline_cpu); -#endif cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "riscv:online", riscv_online_cpu_vec, NULL);
Probing unaligned accesses on boot is time consuming. Provide a function which will be used to look up the access type in a table by id registers. Vendors which provide table entries can then skip the probing. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> --- arch/riscv/kernel/unaligned_access_speed.c | 114 ++++++++++++--------- 1 file changed, 66 insertions(+), 48 deletions(-)