diff mbox series

[7/9] riscv: Prepare for unaligned access type table lookups

Message ID 20250207161939.46139-18-ajones@ventanamicro.com (mailing list archive)
State Superseded
Headers show
Series riscv: Unaligned access speed probing fixes and skipping | expand

Checks

Context Check Description
bjorn/pre-ci_am success Success
bjorn/build-rv32-defconfig success build-rv32-defconfig
bjorn/build-rv64-clang-allmodconfig success build-rv64-clang-allmodconfig
bjorn/build-rv64-gcc-allmodconfig success build-rv64-gcc-allmodconfig
bjorn/build-rv64-nommu-k210-defconfig success build-rv64-nommu-k210-defconfig
bjorn/build-rv64-nommu-k210-virt success build-rv64-nommu-k210-virt
bjorn/checkpatch warning checkpatch
bjorn/dtb-warn-rv64 success dtb-warn-rv64
bjorn/header-inline success header-inline
bjorn/kdoc success kdoc
bjorn/module-param success module-param
bjorn/verify-fixes success verify-fixes
bjorn/verify-signedoff success verify-signedoff

Commit Message

Andrew Jones Feb. 7, 2025, 4:19 p.m. UTC
Probing unaligned accesses on boot is time consuming. Provide a
function which will be used to look up the access type in a table
by id registers. Vendors which provide table entries can then skip
the probing.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
---
 arch/riscv/kernel/unaligned_access_speed.c | 114 ++++++++++++---------
 1 file changed, 66 insertions(+), 48 deletions(-)

Comments

Charlie Jenkins Feb. 8, 2025, 1:22 a.m. UTC | #1
On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote:
> Probing unaligned accesses on boot is time consuming. Provide a
> function which will be used to look up the access type in a table
> by id registers. Vendors which provide table entries can then skip
> the probing.

The access checker in my experience is only time consuming on slow
hardware. Hardware that supports fast unaligned accesses isn't really
impacted by this? Avoiding a list of hardware that has slow/fast
unaligned accesses in the kernel was the main reason for dynamically
checking. We did introduce the config option to compile the kernel with
assumed slow/fast accesses, which of course has the downside of
recompiling the kernel and I assume that you already considered that.

Instead of having a table in the kernel, something that would be more
platform agnostic would be to have an extension that signals this
information. That seems like it would accomplish the same goal and
leverage the existing infrastructure in the kernel, albeit with the need
to make a new extension.

- Charlie
Andrew Jones Feb. 10, 2025, 9:43 a.m. UTC | #2
On Fri, Feb 07, 2025 at 05:22:52PM -0800, Charlie Jenkins wrote:
> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote:
> > Probing unaligned accesses on boot is time consuming. Provide a
> > function which will be used to look up the access type in a table
> > by id registers. Vendors which provide table entries can then skip
> > the probing.
> 
> The access checker in my experience is only time consuming on slow
> hardware. Hardware that supports fast unaligned accesses isn't really
> impacted by this?

That's true, but...

> Avoiding a list of hardware that has slow/fast
> unaligned accesses in the kernel was the main reason for dynamically
> checking.

...I'm not sure why we should try to avoid determining hardware support
by its description when a description can be provided.

> We did introduce the config option to compile the kernel with
> assumed slow/fast accesses, which of course has the downside of
> recompiling the kernel and I assume that you already considered that.

yup

> 
> Instead of having a table in the kernel, something that would be more
> platform agnostic would be to have an extension that signals this
> information. That seems like it would accomplish the same goal and
> leverage the existing infrastructure in the kernel, albeit with the need
> to make a new extension.

Yes, I agree that another profile "named feature" may be the best
approach. I'll consider proposing one, but [1] implies there may be
some resistance to creating something like that.

[1] https://github.com/riscv/riscv-isa-manual/issues/1611

Thanks,
drew
Anup Patel Feb. 10, 2025, 10:16 a.m. UTC | #3
On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
>
> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote:
> > Probing unaligned accesses on boot is time consuming. Provide a
> > function which will be used to look up the access type in a table
> > by id registers. Vendors which provide table entries can then skip
> > the probing.
>
> The access checker in my experience is only time consuming on slow
> hardware. Hardware that supports fast unaligned accesses isn't really
> impacted by this? Avoiding a list of hardware that has slow/fast
> unaligned accesses in the kernel was the main reason for dynamically
> checking. We did introduce the config option to compile the kernel with
> assumed slow/fast accesses, which of course has the downside of
> recompiling the kernel and I assume that you already considered that.

The kconfig option does not align with the vision of running the same
kernel image across platforms.

>
> Instead of having a table in the kernel, something that would be more
> platform agnostic would be to have an extension that signals this
> information. That seems like it would accomplish the same goal and
> leverage the existing infrastructure in the kernel, albeit with the need
> to make a new extension.
>

IMO, expecting an ISA extension to be defined for all possible
microarchitectural choices is not going to scale so it is better
to have infrastructure in kernel itself to infer microarchitectural
choices based on RISC-V implementation ID.

Regards,
Anup
Clément Léger Feb. 10, 2025, 11:07 a.m. UTC | #4
On 10/02/2025 11:16, Anup Patel wrote:
> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
>>
>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote:
>>> Probing unaligned accesses on boot is time consuming. Provide a
>>> function which will be used to look up the access type in a table
>>> by id registers. Vendors which provide table entries can then skip
>>> the probing.
>>
>> The access checker in my experience is only time consuming on slow
>> hardware. Hardware that supports fast unaligned accesses isn't really
>> impacted by this? Avoiding a list of hardware that has slow/fast
>> unaligned accesses in the kernel was the main reason for dynamically
>> checking. We did introduce the config option to compile the kernel with
>> assumed slow/fast accesses, which of course has the downside of
>> recompiling the kernel and I assume that you already considered that.
> 
> The kconfig option does not align with the vision of running the same
> kernel image across platforms.

I'd would be advocating to remove compile time options as well and use
another way to skip the probe (see below).

> 
>>
>> Instead of having a table in the kernel, something that would be more
>> platform agnostic would be to have an extension that signals this
>> information. That seems like it would accomplish the same goal and
>> leverage the existing infrastructure in the kernel, albeit with the need
>> to make a new extension.
>>
> 
> IMO, expecting an ISA extension to be defined for all possible
> microarchitectural choices is not going to scale so it is better
> to have infrastructure in kernel itself to infer microarchitectural
> choices based on RISC-V implementation ID.

Since adding an extension seems quite unlikely, and that a device-tree
property is likely DT centric and not applicable to ACPI as well, was a
command line argument considered ?

Thanks,

Clément

> 
> Regards,
> Anup
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Andrew Jones Feb. 10, 2025, 2:06 p.m. UTC | #5
On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote:
> 
> 
> On 10/02/2025 11:16, Anup Patel wrote:
> > On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
> >>
> >> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote:
> >>> Probing unaligned accesses on boot is time consuming. Provide a
> >>> function which will be used to look up the access type in a table
> >>> by id registers. Vendors which provide table entries can then skip
> >>> the probing.
> >>
> >> The access checker in my experience is only time consuming on slow
> >> hardware. Hardware that supports fast unaligned accesses isn't really
> >> impacted by this? Avoiding a list of hardware that has slow/fast
> >> unaligned accesses in the kernel was the main reason for dynamically
> >> checking. We did introduce the config option to compile the kernel with
> >> assumed slow/fast accesses, which of course has the downside of
> >> recompiling the kernel and I assume that you already considered that.
> > 
> > The kconfig option does not align with the vision of running the same
> > kernel image across platforms.
> 
> I'd would be advocating to remove compile time options as well and use
> another way to skip the probe (see below).
> 
> > 
> >>
> >> Instead of having a table in the kernel, something that would be more
> >> platform agnostic would be to have an extension that signals this
> >> information. That seems like it would accomplish the same goal and
> >> leverage the existing infrastructure in the kernel, albeit with the need
> >> to make a new extension.
> >>
> > 
> > IMO, expecting an ISA extension to be defined for all possible
> > microarchitectural choices is not going to scale so it is better
> > to have infrastructure in kernel itself to infer microarchitectural
> > choices based on RISC-V implementation ID.
> 
> Since adding an extension seems quite unlikely, and that a device-tree
> property is likely DT centric and not applicable to ACPI as well, was a
> command line argument considered ?
>

I did consider adding a command line option in addition to the table,
allowing platforms which neither have a table entry [yet] nor want to do
the speed test, to set whatever they like. In the end, I dropped it, since
I don't have a use case at this time. However, if we really don't want a
table, then I can look into the command line option instead.

Thanks,
drew
Clément Léger Feb. 10, 2025, 2:20 p.m. UTC | #6
On 10/02/2025 15:06, Andrew Jones wrote:
> On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote:
>>
>>
>> On 10/02/2025 11:16, Anup Patel wrote:
>>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
>>>>
>>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote:
>>>>> Probing unaligned accesses on boot is time consuming. Provide a
>>>>> function which will be used to look up the access type in a table
>>>>> by id registers. Vendors which provide table entries can then skip
>>>>> the probing.
>>>>
>>>> The access checker in my experience is only time consuming on slow
>>>> hardware. Hardware that supports fast unaligned accesses isn't really
>>>> impacted by this? Avoiding a list of hardware that has slow/fast
>>>> unaligned accesses in the kernel was the main reason for dynamically
>>>> checking. We did introduce the config option to compile the kernel with
>>>> assumed slow/fast accesses, which of course has the downside of
>>>> recompiling the kernel and I assume that you already considered that.
>>>
>>> The kconfig option does not align with the vision of running the same
>>> kernel image across platforms.
>>
>> I'd would be advocating to remove compile time options as well and use
>> another way to skip the probe (see below).
>>
>>>
>>>>
>>>> Instead of having a table in the kernel, something that would be more
>>>> platform agnostic would be to have an extension that signals this
>>>> information. That seems like it would accomplish the same goal and
>>>> leverage the existing infrastructure in the kernel, albeit with the need
>>>> to make a new extension.
>>>>
>>>
>>> IMO, expecting an ISA extension to be defined for all possible
>>> microarchitectural choices is not going to scale so it is better
>>> to have infrastructure in kernel itself to infer microarchitectural
>>> choices based on RISC-V implementation ID.
>>
>> Since adding an extension seems quite unlikely, and that a device-tree
>> property is likely DT centric and not applicable to ACPI as well, was a
>> command line argument considered ?
>>
> 
> I did consider adding a command line option in addition to the table,
> allowing platforms which neither have a table entry [yet] nor want to do
> the speed test, to set whatever they like. In the end, I dropped it, since
> I don't have a use case at this time. However, if we really don't want a
> table, then I can look into the command line option instead.

Sorry if I wasn't clear, I wasn't considering this as a replacement for
your table but rather as a replacement to Charlie's compile time define
to skip misaligned speed probing since it is like "lpj=<x>". You can
specify it on command line if you want to skip the loop time detection
of loops per jiffies and have faster boot.

Regarding your table, it feels like a bit going back to old hardcoded
platform description ;). I think some kind of auto-detection of speed
(not builtin the kernel) for platforms could be good as well to skip
probing.

A DT property also seems ok to me since the goal is to describe
hardware. Would a common DT/ACPI property be appropriate ? The
device_property API unified both so if we used some common property to
describe the misaligned access speed (both in DT cpu node/ ACPI CPU
device package), we could keep a single parsing method. But I'm no ACPI
expert so I don't know if that really make sense.

Thanks,

Clément

> 
> Thanks,
> drew
Charlie Jenkins Feb. 10, 2025, 5:10 p.m. UTC | #7
On Mon, Feb 10, 2025 at 10:43:18AM +0100, Andrew Jones wrote:
> On Fri, Feb 07, 2025 at 05:22:52PM -0800, Charlie Jenkins wrote:
> > On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote:
> > > Probing unaligned accesses on boot is time consuming. Provide a
> > > function which will be used to look up the access type in a table
> > > by id registers. Vendors which provide table entries can then skip
> > > the probing.
> > 
> > The access checker in my experience is only time consuming on slow
> > hardware. Hardware that supports fast unaligned accesses isn't really
> > impacted by this?
> 
> That's true, but...
> 
> > Avoiding a list of hardware that has slow/fast
> > unaligned accesses in the kernel was the main reason for dynamically
> > checking.
> 
> ...I'm not sure why we should try to avoid determining hardware support
> by its description when a description can be provided.

I worry about scalability of this. This to me seems like a slippery
slope of hardcoding performance tables into the kernel. There are a lot
of riscv vendors and allowing anybody to add a table to the kernel to
dynamically change behavior specifically for their hardware could become
a maintainability nightmare. Avoiding this maintainability issue was the
motivation for the runtime checker.

> 
> > We did introduce the config option to compile the kernel with
> > assumed slow/fast accesses, which of course has the downside of
> > recompiling the kernel and I assume that you already considered that.
> 
> yup
> 
> > 
> > Instead of having a table in the kernel, something that would be more
> > platform agnostic would be to have an extension that signals this
> > information. That seems like it would accomplish the same goal and
> > leverage the existing infrastructure in the kernel, albeit with the need
> > to make a new extension.
> 
> Yes, I agree that another profile "named feature" may be the best
> approach. I'll consider proposing one, but [1] implies there may be

Yeah that thread does highlight the unfortunate way that riscv has
evolved, but I hope that wouldn't be a prohibiting factor here.

- Charlie

> some resistance to creating something like that.
> 
> [1] https://github.com/riscv/riscv-isa-manual/issues/1611
> 
> Thanks,
> drew
Charlie Jenkins Feb. 10, 2025, 5:19 p.m. UTC | #8
On Mon, Feb 10, 2025 at 03:46:46PM +0530, Anup Patel wrote:
> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
> >
> > On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote:
> > > Probing unaligned accesses on boot is time consuming. Provide a
> > > function which will be used to look up the access type in a table
> > > by id registers. Vendors which provide table entries can then skip
> > > the probing.
> >
> > The access checker in my experience is only time consuming on slow
> > hardware. Hardware that supports fast unaligned accesses isn't really
> > impacted by this? Avoiding a list of hardware that has slow/fast
> > unaligned accesses in the kernel was the main reason for dynamically
> > checking. We did introduce the config option to compile the kernel with
> > assumed slow/fast accesses, which of course has the downside of
> > recompiling the kernel and I assume that you already considered that.
> 
> The kconfig option does not align with the vision of running the same
> kernel image across platforms.

I just don't think that vision is realistic.

I am a proponent for compile time defines because ri ght now we are
catering the kernel to both microcontrollers and for high performance
platforms. I am in favor of having a set of configur ations that are
ideal for these microcontrollers and a different set for high
performance platforms. This is where the RVI profile s would ideally
come in, having different configs for different profiles that target low
performance/high performance.

Compiler optimizations for extensions are not possib le to do by just
having these different methods of selecting at runti me. By enabling
extra extensions like the bitmanip extensions during compilation via a
config flag we can optimize the entire kernel. It is not possible to
push all optimizations off to runtime detection.

> 
> >
> > Instead of having a table in the kernel, something that would be more
> > platform agnostic would be to have an extension that signals this
> > information. That seems like it would accomplish the same goal and
> > leverage the existing infrastructure in the kernel, albeit with the need
> > to make a new extension.
> >
> 
> IMO, expecting an ISA extension to be defined for all possible
> microarchitectural choices is not going to scale so it is better
> to have infrastructure in kernel itself to infer microarchitectural
> choices based on RISC-V implementation ID.

How is keeping tables in the kernel for all microarchitectural details
any more scalable than having extensions that do the same thing? I would
argue that having it in the kernel is less scalable since it needs to be
described for all implementation IDs, and all changes require going
through the kernel review process. Dynamic probing avoids these issues.
Having an extension has the one-time process of getting the extension
into something like a profile, but then anybody could use it without
needing a kernel patch.

- Charlie

> 
> Regards,
> Anup
Charlie Jenkins Feb. 10, 2025, 5:20 p.m. UTC | #9
On Mon, Feb 10, 2025 at 03:20:34PM +0100, Clément Léger wrote:
> 
> 
> On 10/02/2025 15:06, Andrew Jones wrote:
> > On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote:
> >>
> >>
> >> On 10/02/2025 11:16, Anup Patel wrote:
> >>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
> >>>>
> >>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote:
> >>>>> Probing unaligned accesses on boot is time consuming. Provide a
> >>>>> function which will be used to look up the access type in a table
> >>>>> by id registers. Vendors which provide table entries can then skip
> >>>>> the probing.
> >>>>
> >>>> The access checker in my experience is only time consuming on slow
> >>>> hardware. Hardware that supports fast unaligned accesses isn't really
> >>>> impacted by this? Avoiding a list of hardware that has slow/fast
> >>>> unaligned accesses in the kernel was the main reason for dynamically
> >>>> checking. We did introduce the config option to compile the kernel with
> >>>> assumed slow/fast accesses, which of course has the downside of
> >>>> recompiling the kernel and I assume that you already considered that.
> >>>
> >>> The kconfig option does not align with the vision of running the same
> >>> kernel image across platforms.
> >>
> >> I'd would be advocating to remove compile time options as well and use
> >> another way to skip the probe (see below).
> >>
> >>>
> >>>>
> >>>> Instead of having a table in the kernel, something that would be more
> >>>> platform agnostic would be to have an extension that signals this
> >>>> information. That seems like it would accomplish the same goal and
> >>>> leverage the existing infrastructure in the kernel, albeit with the need
> >>>> to make a new extension.
> >>>>
> >>>
> >>> IMO, expecting an ISA extension to be defined for all possible
> >>> microarchitectural choices is not going to scale so it is better
> >>> to have infrastructure in kernel itself to infer microarchitectural
> >>> choices based on RISC-V implementation ID.
> >>
> >> Since adding an extension seems quite unlikely, and that a device-tree
> >> property is likely DT centric and not applicable to ACPI as well, was a
> >> command line argument considered ?
> >>
> > 
> > I did consider adding a command line option in addition to the table,
> > allowing platforms which neither have a table entry [yet] nor want to do
> > the speed test, to set whatever they like. In the end, I dropped it, since
> > I don't have a use case at this time. However, if we really don't want a
> > table, then I can look into the command line option instead.
> 
> Sorry if I wasn't clear, I wasn't considering this as a replacement for
> your table but rather as a replacement to Charlie's compile time define
> to skip misaligned speed probing since it is like "lpj=<x>". You can
> specify it on command line if you want to skip the loop time detection
> of loops per jiffies and have faster boot.

Jesse sent out a patch for a kernel parameter to set the access speed to
whatever is desired [1].

- Charlie

> -}
> -#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */
> -static void __init check_unaligned_access_speed_all_cpus(void)
> -{
> -}
> -#endif
> -
>  #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
>  static void check_vector_unaligned_access(struct work_struct *work __always_unused)
>  {
> @@ -370,6 +380,11 @@ static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway
>  }
>  #endif
>  
> +static bool check_vector_unaligned_access_table(void)
> +{
> +	return false;
> +}
> +
>  static int riscv_online_cpu_vec(unsigned int cpu)
>  {
>  	if (!has_vector()) {
> @@ -377,6 +392,9 @@ static int riscv_online_cpu_vec(unsigned int cpu)
>  		return 0;
>  	}
>  
> +	if (check_vector_unaligned_access_table())
> +		return 0;
> +
>  #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
>  	if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN)
>  		return 0;
> @@ -392,13 +410,15 @@ static int __init check_unaligned_access_all_cpus(void)
>  {
>  	int cpu;
>  
> -	if (!check_unaligned_access_emulated_all_cpus())
> +	if (!check_unaligned_access_table() &&
> +	    !check_unaligned_access_emulated_all_cpus())
>  		check_unaligned_access_speed_all_cpus();
>  
>  	if (!has_vector()) {
>  		for_each_online_cpu(cpu)
>  			per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED;
> -	} else if (!check_vector_unaligned_access_emulated_all_cpus() &&
> +	} else if (!check_vector_unaligned_access_table() &&
> +		   !check_vector_unaligned_access_emulated_all_cpus() &&
>  		   IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) {
>  		kthread_run(vec_check_unaligned_access_speed_all_cpus,
>  			    NULL, "vec_check_unaligned_access_speed_all_cpus");

> 
> Regarding your table, it feels like a bit going back to old hardcoded
> platform description ;). I think some kind of auto-detection of speed
> (not builtin the kernel) for platforms could be good as well to skip
> probing.
> 
> A DT property also seems ok to me since the goal is to describe
> hardware. Would a common DT/ACPI property be appropriate ? The
> device_property API unified both so if we used some common property to
> describe the misaligned access speed (both in DT cpu node/ ACPI CPU
> device package), we could keep a single parsing method. But I'm no ACPI
> expert so I don't know if that really make sense.
> 
> Thanks,
> 
> Clément
> 
> > 
> > Thanks,
> > drew
>
Clément Léger Feb. 10, 2025, 8:37 p.m. UTC | #10
On 10/02/2025 18:19, Charlie Jenkins wrote:
> On Mon, Feb 10, 2025 at 03:46:46PM +0530, Anup Patel wrote:
>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
>>>
>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote:
>>>> Probing unaligned accesses on boot is time consuming. Provide a
>>>> function which will be used to look up the access type in a table
>>>> by id registers. Vendors which provide table entries can then skip
>>>> the probing.
>>>
>>> The access checker in my experience is only time consuming on slow
>>> hardware. Hardware that supports fast unaligned accesses isn't really
>>> impacted by this? Avoiding a list of hardware that has slow/fast
>>> unaligned accesses in the kernel was the main reason for dynamically
>>> checking. We did introduce the config option to compile the kernel with
>>> assumed slow/fast accesses, which of course has the downside of
>>> recompiling the kernel and I assume that you already considered that.
>>
>> The kconfig option does not align with the vision of running the same
>> kernel image across platforms.
> 
> I just don't think that vision is realistic.
> 
> I am a proponent for compile time defines because ri ght now we are
> catering the kernel to both microcontrollers and for high performance
> platforms. I am in favor of having a set of configur ations that are
> ideal for these microcontrollers and a different set for high
> performance platforms. This is where the RVI profile s would ideally
> come in, having different configs for different profiles that target low
> performance/high performance.
> 
> Compiler optimizations for extensions are not possib le to do by just
> having these different methods of selecting at runti me. By enabling
> extra extensions like the bitmanip extensions during compilation via a
> config flag we can optimize the entire kernel. It is not possible to
> push all optimizations off to runtime detection.

While this might be true for the bitmanip extension and other extensions
that the compiler can take advantage of, that isn't true for the
misaligned speed probing code. The only meaningful misaligned access
configuration option for kernel "speed" optimization is
RISCV_EFFICIENT_UNALIGNED_ACCESS (which is ironically not easily
selectable since it is under NON_PORTABLE).

Currently all the config options that have been added around misaligned
support "just" allows to get rid of the probing code at compile time and
set a predefined speed. That does not really improve the kernel
performance itself, just allows for a faster boot. That could as well be
supported using a command line option as suggested.

That being said, I agree that some additional configuration options
should be added to enable additional extension support at compile time,
enabling a faster kernel. Having a single image for all hardware is as
you said not realistic but profiles configuration as you proposed might
be the key for this support.

Clément

> 
>>
>>>
>>> Instead of having a table in the kernel, something that would be more
>>> platform agnostic would be to have an extension that signals this
>>> information. That seems like it would accomplish the same goal and
>>> leverage the existing infrastructure in the kernel, albeit with the need
>>> to make a new extension.
>>>
>>
>> IMO, expecting an ISA extension to be defined for all possible
>> microarchitectural choices is not going to scale so it is better
>> to have infrastructure in kernel itself to infer microarchitectural
>> choices based on RISC-V implementation ID.
> 
> How is keeping tables in the kernel for all microarchitectural details
> any more scalable than having extensions that do the same thing? I would
> argue that having it in the kernel is less scalable since it needs to be
> described for all implementation IDs, and all changes require going
> through the kernel review process. Dynamic probing avoids these issues.
> Having an extension has the one-time process of getting the extension
> into something like a profile, but then anybody could use it without
> needing a kernel patch.
> 
> - Charlie
> 
>>
>> Regards,
>> Anup
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Clément Léger Feb. 10, 2025, 8:42 p.m. UTC | #11
On 10/02/2025 18:20, Charlie Jenkins wrote:
> On Mon, Feb 10, 2025 at 03:20:34PM +0100, Clément Léger wrote:
>>
>>
>> On 10/02/2025 15:06, Andrew Jones wrote:
>>> On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote:
>>>>
>>>>
>>>> On 10/02/2025 11:16, Anup Patel wrote:
>>>>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
>>>>>>
>>>>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote:
>>>>>>> Probing unaligned accesses on boot is time consuming. Provide a
>>>>>>> function which will be used to look up the access type in a table
>>>>>>> by id registers. Vendors which provide table entries can then skip
>>>>>>> the probing.
>>>>>>
>>>>>> The access checker in my experience is only time consuming on slow
>>>>>> hardware. Hardware that supports fast unaligned accesses isn't really
>>>>>> impacted by this? Avoiding a list of hardware that has slow/fast
>>>>>> unaligned accesses in the kernel was the main reason for dynamically
>>>>>> checking. We did introduce the config option to compile the kernel with
>>>>>> assumed slow/fast accesses, which of course has the downside of
>>>>>> recompiling the kernel and I assume that you already considered that.
>>>>>
>>>>> The kconfig option does not align with the vision of running the same
>>>>> kernel image across platforms.
>>>>
>>>> I'd would be advocating to remove compile time options as well and use
>>>> another way to skip the probe (see below).
>>>>
>>>>>
>>>>>>
>>>>>> Instead of having a table in the kernel, something that would be more
>>>>>> platform agnostic would be to have an extension that signals this
>>>>>> information. That seems like it would accomplish the same goal and
>>>>>> leverage the existing infrastructure in the kernel, albeit with the need
>>>>>> to make a new extension.
>>>>>>
>>>>>
>>>>> IMO, expecting an ISA extension to be defined for all possible
>>>>> microarchitectural choices is not going to scale so it is better
>>>>> to have infrastructure in kernel itself to infer microarchitectural
>>>>> choices based on RISC-V implementation ID.
>>>>
>>>> Since adding an extension seems quite unlikely, and that a device-tree
>>>> property is likely DT centric and not applicable to ACPI as well, was a
>>>> command line argument considered ?
>>>>
>>>
>>> I did consider adding a command line option in addition to the table,
>>> allowing platforms which neither have a table entry [yet] nor want to do
>>> the speed test, to set whatever they like. In the end, I dropped it, since
>>> I don't have a use case at this time. However, if we really don't want a
>>> table, then I can look into the command line option instead.
>>
>> Sorry if I wasn't clear, I wasn't considering this as a replacement for
>> your table but rather as a replacement to Charlie's compile time define
>> to skip misaligned speed probing since it is like "lpj=<x>". You can
>> specify it on command line if you want to skip the loop time detection
>> of loops per jiffies and have faster boot.
> 
> Jesse sent out a patch for a kernel parameter to set the access speed to
> whatever is desired [1].

Hey Charlie,

Thanks but it seems you forgot to add the link ?

Having configuration option + command line option seems like something
particularly heavy for such feature. The ifdefery/config options
involved in the misaligned probing code is already quite complicated. If
another mean to specify the misaligned speed access is added, I think
all configuration options to set the speed of accesses can then be
removed and just keep the command line. That will certainly simplify the
ifdef/config options.

Clément

> 
> - Charlie
> 
>> -}
>> -#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */
>> -static void __init check_unaligned_access_speed_all_cpus(void)
>> -{
>> -}
>> -#endif
>> -
>>  #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
>>  static void check_vector_unaligned_access(struct work_struct *work __always_unused)
>>  {
>> @@ -370,6 +380,11 @@ static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway
>>  }
>>  #endif
>>  
>> +static bool check_vector_unaligned_access_table(void)
>> +{
>> +	return false;
>> +}
>> +
>>  static int riscv_online_cpu_vec(unsigned int cpu)
>>  {
>>  	if (!has_vector()) {
>> @@ -377,6 +392,9 @@ static int riscv_online_cpu_vec(unsigned int cpu)
>>  		return 0;
>>  	}
>>  
>> +	if (check_vector_unaligned_access_table())
>> +		return 0;
>> +
>>  #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
>>  	if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN)
>>  		return 0;
>> @@ -392,13 +410,15 @@ static int __init check_unaligned_access_all_cpus(void)
>>  {
>>  	int cpu;
>>  
>> -	if (!check_unaligned_access_emulated_all_cpus())
>> +	if (!check_unaligned_access_table() &&
>> +	    !check_unaligned_access_emulated_all_cpus())
>>  		check_unaligned_access_speed_all_cpus();
>>  
>>  	if (!has_vector()) {
>>  		for_each_online_cpu(cpu)
>>  			per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED;
>> -	} else if (!check_vector_unaligned_access_emulated_all_cpus() &&
>> +	} else if (!check_vector_unaligned_access_table() &&
>> +		   !check_vector_unaligned_access_emulated_all_cpus() &&
>>  		   IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) {
>>  		kthread_run(vec_check_unaligned_access_speed_all_cpus,
>>  			    NULL, "vec_check_unaligned_access_speed_all_cpus");
> 
>>
>> Regarding your table, it feels like a bit going back to old hardcoded
>> platform description ;). I think some kind of auto-detection of speed
>> (not builtin the kernel) for platforms could be good as well to skip
>> probing.
>>
>> A DT property also seems ok to me since the goal is to describe
>> hardware. Would a common DT/ACPI property be appropriate ? The
>> device_property API unified both so if we used some common property to
>> describe the misaligned access speed (both in DT cpu node/ ACPI CPU
>> device package), we could keep a single parsing method. But I'm no ACPI
>> expert so I don't know if that really make sense.
>>
>> Thanks,
>>
>> Clément
>>
>>>
>>> Thanks,
>>> drew
>>
Charlie Jenkins Feb. 10, 2025, 8:53 p.m. UTC | #12
On Mon, Feb 10, 2025 at 09:42:25PM +0100, Clément Léger wrote:
> 
> 
> On 10/02/2025 18:20, Charlie Jenkins wrote:
> > On Mon, Feb 10, 2025 at 03:20:34PM +0100, Clément Léger wrote:
> >>
> >>
> >> On 10/02/2025 15:06, Andrew Jones wrote:
> >>> On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote:
> >>>>
> >>>>
> >>>> On 10/02/2025 11:16, Anup Patel wrote:
> >>>>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
> >>>>>>
> >>>>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote:
> >>>>>>> Probing unaligned accesses on boot is time consuming. Provide a
> >>>>>>> function which will be used to look up the access type in a table
> >>>>>>> by id registers. Vendors which provide table entries can then skip
> >>>>>>> the probing.
> >>>>>>
> >>>>>> The access checker in my experience is only time consuming on slow
> >>>>>> hardware. Hardware that supports fast unaligned accesses isn't really
> >>>>>> impacted by this? Avoiding a list of hardware that has slow/fast
> >>>>>> unaligned accesses in the kernel was the main reason for dynamically
> >>>>>> checking. We did introduce the config option to compile the kernel with
> >>>>>> assumed slow/fast accesses, which of course has the downside of
> >>>>>> recompiling the kernel and I assume that you already considered that.
> >>>>>
> >>>>> The kconfig option does not align with the vision of running the same
> >>>>> kernel image across platforms.
> >>>>
> >>>> I'd would be advocating to remove compile time options as well and use
> >>>> another way to skip the probe (see below).
> >>>>
> >>>>>
> >>>>>>
> >>>>>> Instead of having a table in the kernel, something that would be more
> >>>>>> platform agnostic would be to have an extension that signals this
> >>>>>> information. That seems like it would accomplish the same goal and
> >>>>>> leverage the existing infrastructure in the kernel, albeit with the need
> >>>>>> to make a new extension.
> >>>>>>
> >>>>>
> >>>>> IMO, expecting an ISA extension to be defined for all possible
> >>>>> microarchitectural choices is not going to scale so it is better
> >>>>> to have infrastructure in kernel itself to infer microarchitectural
> >>>>> choices based on RISC-V implementation ID.
> >>>>
> >>>> Since adding an extension seems quite unlikely, and that a device-tree
> >>>> property is likely DT centric and not applicable to ACPI as well, was a
> >>>> command line argument considered ?
> >>>>
> >>>
> >>> I did consider adding a command line option in addition to the table,
> >>> allowing platforms which neither have a table entry [yet] nor want to do
> >>> the speed test, to set whatever they like. In the end, I dropped it, since
> >>> I don't have a use case at this time. However, if we really don't want a
> >>> table, then I can look into the command line option instead.
> >>
> >> Sorry if I wasn't clear, I wasn't considering this as a replacement for
> >> your table but rather as a replacement to Charlie's compile time define
> >> to skip misaligned speed probing since it is like "lpj=<x>". You can
> >> specify it on command line if you want to skip the loop time detection
> >> of loops per jiffies and have faster boot.
> > 
> > Jesse sent out a patch for a kernel parameter to set the access speed to
> > whatever is desired [1].
> 
> Hey Charlie,
> 
> Thanks but it seems you forgot to add the link ?

Oops, I frequently do that...

https://lore.kernel.org/linux-riscv/20240805173816.3722002-1-jesse@rivosinc.com/

> 
> Having configuration option + command line option seems like something
> particularly heavy for such feature. The ifdefery/config options
> involved in the misaligned probing code is already quite complicated. If
> another mean to specify the misaligned speed access is added, I think
> all configuration options to set the speed of accesses can then be
> removed and just keep the command line. That will certainly simplify the
> ifdef/config options.

Yeah that's why it didn't get merged because it felt like overkill. I
responded on the thread to Anup as why I would prefer config options. It
just comes down to config options being required to enable compiler
features. The kernel is only built with rv64gc and usage of all other
extensions requires hand written assembly. There are easy performance
gains when compiling the kernel with rv64gc_zba_zbb_zbkb etc.
Performance focused kernels will need to be recompiled anyway so I am of
the opinion that grouping in other performance features as config
options like this is the easiest thing to do and reduces the amount of
code in the kernel.

- Charlie

> 
> Clément
> 
> > 
> > - Charlie
> > 
> >> -}
> >> -#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */
> >> -static void __init check_unaligned_access_speed_all_cpus(void)
> >> -{
> >> -}
> >> -#endif
> >> -
> >>  #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
> >>  static void check_vector_unaligned_access(struct work_struct *work __always_unused)
> >>  {
> >> @@ -370,6 +380,11 @@ static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway
> >>  }
> >>  #endif
> >>  
> >> +static bool check_vector_unaligned_access_table(void)
> >> +{
> >> +	return false;
> >> +}
> >> +
> >>  static int riscv_online_cpu_vec(unsigned int cpu)
> >>  {
> >>  	if (!has_vector()) {
> >> @@ -377,6 +392,9 @@ static int riscv_online_cpu_vec(unsigned int cpu)
> >>  		return 0;
> >>  	}
> >>  
> >> +	if (check_vector_unaligned_access_table())
> >> +		return 0;
> >> +
> >>  #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
> >>  	if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN)
> >>  		return 0;
> >> @@ -392,13 +410,15 @@ static int __init check_unaligned_access_all_cpus(void)
> >>  {
> >>  	int cpu;
> >>  
> >> -	if (!check_unaligned_access_emulated_all_cpus())
> >> +	if (!check_unaligned_access_table() &&
> >> +	    !check_unaligned_access_emulated_all_cpus())
> >>  		check_unaligned_access_speed_all_cpus();
> >>  
> >>  	if (!has_vector()) {
> >>  		for_each_online_cpu(cpu)
> >>  			per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED;
> >> -	} else if (!check_vector_unaligned_access_emulated_all_cpus() &&
> >> +	} else if (!check_vector_unaligned_access_table() &&
> >> +		   !check_vector_unaligned_access_emulated_all_cpus() &&
> >>  		   IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) {
> >>  		kthread_run(vec_check_unaligned_access_speed_all_cpus,
> >>  			    NULL, "vec_check_unaligned_access_speed_all_cpus");
> > 
> >>
> >> Regarding your table, it feels like a bit going back to old hardcoded
> >> platform description ;). I think some kind of auto-detection of speed
> >> (not builtin the kernel) for platforms could be good as well to skip
> >> probing.
> >>
> >> A DT property also seems ok to me since the goal is to describe
> >> hardware. Would a common DT/ACPI property be appropriate ? The
> >> device_property API unified both so if we used some common property to
> >> describe the misaligned access speed (both in DT cpu node/ ACPI CPU
> >> device package), we could keep a single parsing method. But I'm no ACPI
> >> expert so I don't know if that really make sense.
> >>
> >> Thanks,
> >>
> >> Clément
> >>
> >>>
> >>> Thanks,
> >>> drew
> >>
>
Clément Léger Feb. 10, 2025, 8:57 p.m. UTC | #13
On 10/02/2025 21:53, Charlie Jenkins wrote:
> On Mon, Feb 10, 2025 at 09:42:25PM +0100, Clément Léger wrote:
>>
>>
>> On 10/02/2025 18:20, Charlie Jenkins wrote:
>>> On Mon, Feb 10, 2025 at 03:20:34PM +0100, Clément Léger wrote:
>>>>
>>>>
>>>> On 10/02/2025 15:06, Andrew Jones wrote:
>>>>> On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote:
>>>>>>
>>>>>>
>>>>>> On 10/02/2025 11:16, Anup Patel wrote:
>>>>>>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
>>>>>>>>
>>>>>>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote:
>>>>>>>>> Probing unaligned accesses on boot is time consuming. Provide a
>>>>>>>>> function which will be used to look up the access type in a table
>>>>>>>>> by id registers. Vendors which provide table entries can then skip
>>>>>>>>> the probing.
>>>>>>>>
>>>>>>>> The access checker in my experience is only time consuming on slow
>>>>>>>> hardware. Hardware that supports fast unaligned accesses isn't really
>>>>>>>> impacted by this? Avoiding a list of hardware that has slow/fast
>>>>>>>> unaligned accesses in the kernel was the main reason for dynamically
>>>>>>>> checking. We did introduce the config option to compile the kernel with
>>>>>>>> assumed slow/fast accesses, which of course has the downside of
>>>>>>>> recompiling the kernel and I assume that you already considered that.
>>>>>>>
>>>>>>> The kconfig option does not align with the vision of running the same
>>>>>>> kernel image across platforms.
>>>>>>
>>>>>> I'd would be advocating to remove compile time options as well and use
>>>>>> another way to skip the probe (see below).
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Instead of having a table in the kernel, something that would be more
>>>>>>>> platform agnostic would be to have an extension that signals this
>>>>>>>> information. That seems like it would accomplish the same goal and
>>>>>>>> leverage the existing infrastructure in the kernel, albeit with the need
>>>>>>>> to make a new extension.
>>>>>>>>
>>>>>>>
>>>>>>> IMO, expecting an ISA extension to be defined for all possible
>>>>>>> microarchitectural choices is not going to scale so it is better
>>>>>>> to have infrastructure in kernel itself to infer microarchitectural
>>>>>>> choices based on RISC-V implementation ID.
>>>>>>
>>>>>> Since adding an extension seems quite unlikely, and that a device-tree
>>>>>> property is likely DT centric and not applicable to ACPI as well, was a
>>>>>> command line argument considered ?
>>>>>>
>>>>>
>>>>> I did consider adding a command line option in addition to the table,
>>>>> allowing platforms which neither have a table entry [yet] nor want to do
>>>>> the speed test, to set whatever they like. In the end, I dropped it, since
>>>>> I don't have a use case at this time. However, if we really don't want a
>>>>> table, then I can look into the command line option instead.
>>>>
>>>> Sorry if I wasn't clear, I wasn't considering this as a replacement for
>>>> your table but rather as a replacement to Charlie's compile time define
>>>> to skip misaligned speed probing since it is like "lpj=<x>". You can
>>>> specify it on command line if you want to skip the loop time detection
>>>> of loops per jiffies and have faster boot.
>>>
>>> Jesse sent out a patch for a kernel parameter to set the access speed to
>>> whatever is desired [1].
>>
>> Hey Charlie,
>>
>> Thanks but it seems you forgot to add the link ?
> 
> Oops, I frequently do that...
> 
> https://lore.kernel.org/linux-riscv/20240805173816.3722002-1-jesse@rivosinc.com/
> 
>>
>> Having configuration option + command line option seems like something
>> particularly heavy for such feature. The ifdefery/config options
>> involved in the misaligned probing code is already quite complicated. If
>> another mean to specify the misaligned speed access is added, I think
>> all configuration options to set the speed of accesses can then be
>> removed and just keep the command line. That will certainly simplify the
>> ifdef/config options.
> 
> Yeah that's why it didn't get merged because it felt like overkill. I
> responded on the thread to Anup as why I would prefer config options. It
> just comes down to config options being required to enable compiler
> features. The kernel is only built with rv64gc and usage of all other
> extensions requires hand written assembly. There are easy performance
> gains when compiling the kernel with rv64gc_zba_zbb_zbkb etc.
> Performance focused kernels will need to be recompiled anyway so I am of
> the opinion that grouping in other performance features as config
> options like this is the easiest thing to do and reduces the amount of
> code in the kernel.

As answered on the other thread, totally agree, except for the
misaligned accesses probing config options ;). Ultimately, we need
profiles configuration, either via defconfigs that enables a bunch of
optimization via ISA extension or configuration options that groups
these config options.

Clément

> 
> - Charlie
> 
>>
>> Clément
>>
>>>
>>> - Charlie
>>>
>>>> -}
>>>> -#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */
>>>> -static void __init check_unaligned_access_speed_all_cpus(void)
>>>> -{
>>>> -}
>>>> -#endif
>>>> -
>>>>  #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
>>>>  static void check_vector_unaligned_access(struct work_struct *work __always_unused)
>>>>  {
>>>> @@ -370,6 +380,11 @@ static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway
>>>>  }
>>>>  #endif
>>>>  
>>>> +static bool check_vector_unaligned_access_table(void)
>>>> +{
>>>> +	return false;
>>>> +}
>>>> +
>>>>  static int riscv_online_cpu_vec(unsigned int cpu)
>>>>  {
>>>>  	if (!has_vector()) {
>>>> @@ -377,6 +392,9 @@ static int riscv_online_cpu_vec(unsigned int cpu)
>>>>  		return 0;
>>>>  	}
>>>>  
>>>> +	if (check_vector_unaligned_access_table())
>>>> +		return 0;
>>>> +
>>>>  #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
>>>>  	if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN)
>>>>  		return 0;
>>>> @@ -392,13 +410,15 @@ static int __init check_unaligned_access_all_cpus(void)
>>>>  {
>>>>  	int cpu;
>>>>  
>>>> -	if (!check_unaligned_access_emulated_all_cpus())
>>>> +	if (!check_unaligned_access_table() &&
>>>> +	    !check_unaligned_access_emulated_all_cpus())
>>>>  		check_unaligned_access_speed_all_cpus();
>>>>  
>>>>  	if (!has_vector()) {
>>>>  		for_each_online_cpu(cpu)
>>>>  			per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED;
>>>> -	} else if (!check_vector_unaligned_access_emulated_all_cpus() &&
>>>> +	} else if (!check_vector_unaligned_access_table() &&
>>>> +		   !check_vector_unaligned_access_emulated_all_cpus() &&
>>>>  		   IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) {
>>>>  		kthread_run(vec_check_unaligned_access_speed_all_cpus,
>>>>  			    NULL, "vec_check_unaligned_access_speed_all_cpus");
>>>
>>>>
>>>> Regarding your table, it feels like a bit going back to old hardcoded
>>>> platform description ;). I think some kind of auto-detection of speed
>>>> (not builtin the kernel) for platforms could be good as well to skip
>>>> probing.
>>>>
>>>> A DT property also seems ok to me since the goal is to describe
>>>> hardware. Would a common DT/ACPI property be appropriate ? The
>>>> device_property API unified both so if we used some common property to
>>>> describe the misaligned access speed (both in DT cpu node/ ACPI CPU
>>>> device package), we could keep a single parsing method. But I'm no ACPI
>>>> expert so I don't know if that really make sense.
>>>>
>>>> Thanks,
>>>>
>>>> Clément
>>>>
>>>>>
>>>>> Thanks,
>>>>> drew
>>>>
>>
Charlie Jenkins Feb. 10, 2025, 9:13 p.m. UTC | #14
On Mon, Feb 10, 2025 at 09:57:26PM +0100, Clément Léger wrote:
> 
> 
> On 10/02/2025 21:53, Charlie Jenkins wrote:
> > On Mon, Feb 10, 2025 at 09:42:25PM +0100, Clément Léger wrote:
> >>
> >>
> >> On 10/02/2025 18:20, Charlie Jenkins wrote:
> >>> On Mon, Feb 10, 2025 at 03:20:34PM +0100, Clément Léger wrote:
> >>>>
> >>>>
> >>>> On 10/02/2025 15:06, Andrew Jones wrote:
> >>>>> On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 10/02/2025 11:16, Anup Patel wrote:
> >>>>>>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
> >>>>>>>>
> >>>>>>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote:
> >>>>>>>>> Probing unaligned accesses on boot is time consuming. Provide a
> >>>>>>>>> function which will be used to look up the access type in a table
> >>>>>>>>> by id registers. Vendors which provide table entries can then skip
> >>>>>>>>> the probing.
> >>>>>>>>
> >>>>>>>> The access checker in my experience is only time consuming on slow
> >>>>>>>> hardware. Hardware that supports fast unaligned accesses isn't really
> >>>>>>>> impacted by this? Avoiding a list of hardware that has slow/fast
> >>>>>>>> unaligned accesses in the kernel was the main reason for dynamically
> >>>>>>>> checking. We did introduce the config option to compile the kernel with
> >>>>>>>> assumed slow/fast accesses, which of course has the downside of
> >>>>>>>> recompiling the kernel and I assume that you already considered that.
> >>>>>>>
> >>>>>>> The kconfig option does not align with the vision of running the same
> >>>>>>> kernel image across platforms.
> >>>>>>
> >>>>>> I'd would be advocating to remove compile time options as well and use
> >>>>>> another way to skip the probe (see below).
> >>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Instead of having a table in the kernel, something that would be more
> >>>>>>>> platform agnostic would be to have an extension that signals this
> >>>>>>>> information. That seems like it would accomplish the same goal and
> >>>>>>>> leverage the existing infrastructure in the kernel, albeit with the need
> >>>>>>>> to make a new extension.
> >>>>>>>>
> >>>>>>>
> >>>>>>> IMO, expecting an ISA extension to be defined for all possible
> >>>>>>> microarchitectural choices is not going to scale so it is better
> >>>>>>> to have infrastructure in kernel itself to infer microarchitectural
> >>>>>>> choices based on RISC-V implementation ID.
> >>>>>>
> >>>>>> Since adding an extension seems quite unlikely, and that a device-tree
> >>>>>> property is likely DT centric and not applicable to ACPI as well, was a
> >>>>>> command line argument considered ?
> >>>>>>
> >>>>>
> >>>>> I did consider adding a command line option in addition to the table,
> >>>>> allowing platforms which neither have a table entry [yet] nor want to do
> >>>>> the speed test, to set whatever they like. In the end, I dropped it, since
> >>>>> I don't have a use case at this time. However, if we really don't want a
> >>>>> table, then I can look into the command line option instead.
> >>>>
> >>>> Sorry if I wasn't clear, I wasn't considering this as a replacement for
> >>>> your table but rather as a replacement to Charlie's compile time define
> >>>> to skip misaligned speed probing since it is like "lpj=<x>". You can
> >>>> specify it on command line if you want to skip the loop time detection
> >>>> of loops per jiffies and have faster boot.
> >>>
> >>> Jesse sent out a patch for a kernel parameter to set the access speed to
> >>> whatever is desired [1].
> >>
> >> Hey Charlie,
> >>
> >> Thanks but it seems you forgot to add the link ?
> > 
> > Oops, I frequently do that...
> > 
> > https://lore.kernel.org/linux-riscv/20240805173816.3722002-1-jesse@rivosinc.com/
> > 
> >>
> >> Having configuration option + command line option seems like something
> >> particularly heavy for such feature. The ifdefery/config options
> >> involved in the misaligned probing code is already quite complicated. If
> >> another mean to specify the misaligned speed access is added, I think
> >> all configuration options to set the speed of accesses can then be
> >> removed and just keep the command line. That will certainly simplify the
> >> ifdef/config options.
> > 
> > Yeah that's why it didn't get merged because it felt like overkill. I
> > responded on the thread to Anup as why I would prefer config options. It
> > just comes down to config options being required to enable compiler
> > features. The kernel is only built with rv64gc and usage of all other
> > extensions requires hand written assembly. There are easy performance
> > gains when compiling the kernel with rv64gc_zba_zbb_zbkb etc.
> > Performance focused kernels will need to be recompiled anyway so I am of
> > the opinion that grouping in other performance features as config
> > options like this is the easiest thing to do and reduces the amount of
> > code in the kernel.
> 
> As answered on the other thread, totally agree, except for the
> misaligned accesses probing config options ;).

Oh! I have missed that response, where is that?

> Ultimately, we need
> profiles configuration, either via defconfigs that enables a bunch of
> optimization via ISA extension or configuration options that groups
> these config options.

Why do you agree with profile configs for other things but not for
misaligned access probing?

> 
> Clément
> 
> > 
> > - Charlie
> > 
> >>
> >> Clément
> >>
> >>>
> >>> - Charlie
> >>>
> >>>> -}
> >>>> -#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */
> >>>> -static void __init check_unaligned_access_speed_all_cpus(void)
> >>>> -{
> >>>> -}
> >>>> -#endif
> >>>> -
> >>>>  #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
> >>>>  static void check_vector_unaligned_access(struct work_struct *work __always_unused)
> >>>>  {
> >>>> @@ -370,6 +380,11 @@ static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway
> >>>>  }
> >>>>  #endif
> >>>>  
> >>>> +static bool check_vector_unaligned_access_table(void)
> >>>> +{
> >>>> +	return false;
> >>>> +}
> >>>> +
> >>>>  static int riscv_online_cpu_vec(unsigned int cpu)
> >>>>  {
> >>>>  	if (!has_vector()) {
> >>>> @@ -377,6 +392,9 @@ static int riscv_online_cpu_vec(unsigned int cpu)
> >>>>  		return 0;
> >>>>  	}
> >>>>  
> >>>> +	if (check_vector_unaligned_access_table())
> >>>> +		return 0;
> >>>> +
> >>>>  #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
> >>>>  	if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN)
> >>>>  		return 0;
> >>>> @@ -392,13 +410,15 @@ static int __init check_unaligned_access_all_cpus(void)
> >>>>  {
> >>>>  	int cpu;
> >>>>  
> >>>> -	if (!check_unaligned_access_emulated_all_cpus())
> >>>> +	if (!check_unaligned_access_table() &&
> >>>> +	    !check_unaligned_access_emulated_all_cpus())
> >>>>  		check_unaligned_access_speed_all_cpus();
> >>>>  
> >>>>  	if (!has_vector()) {
> >>>>  		for_each_online_cpu(cpu)
> >>>>  			per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED;
> >>>> -	} else if (!check_vector_unaligned_access_emulated_all_cpus() &&
> >>>> +	} else if (!check_vector_unaligned_access_table() &&
> >>>> +		   !check_vector_unaligned_access_emulated_all_cpus() &&
> >>>>  		   IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) {
> >>>>  		kthread_run(vec_check_unaligned_access_speed_all_cpus,
> >>>>  			    NULL, "vec_check_unaligned_access_speed_all_cpus");
> >>>
> >>>>
> >>>> Regarding your table, it feels like a bit going back to old hardcoded
> >>>> platform description ;). I think some kind of auto-detection of speed
> >>>> (not builtin the kernel) for platforms could be good as well to skip
> >>>> probing.
> >>>>
> >>>> A DT property also seems ok to me since the goal is to describe
> >>>> hardware. Would a common DT/ACPI property be appropriate ? The
> >>>> device_property API unified both so if we used some common property to
> >>>> describe the misaligned access speed (both in DT cpu node/ ACPI CPU
> >>>> device package), we could keep a single parsing method. But I'm no ACPI
> >>>> expert so I don't know if that really make sense.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Clément
> >>>>
> >>>>>
> >>>>> Thanks,
> >>>>> drew
> >>>>
> >>
>
Anup Patel Feb. 11, 2025, 4:26 a.m. UTC | #15
On Tue, Feb 11, 2025 at 2:27 AM Clément Léger <cleger@rivosinc.com> wrote:
>
>
>
> On 10/02/2025 21:53, Charlie Jenkins wrote:
> > On Mon, Feb 10, 2025 at 09:42:25PM +0100, Clément Léger wrote:
> >>
> >>
> >> On 10/02/2025 18:20, Charlie Jenkins wrote:
> >>> On Mon, Feb 10, 2025 at 03:20:34PM +0100, Clément Léger wrote:
> >>>>
> >>>>
> >>>> On 10/02/2025 15:06, Andrew Jones wrote:
> >>>>> On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 10/02/2025 11:16, Anup Patel wrote:
> >>>>>>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
> >>>>>>>>
> >>>>>>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote:
> >>>>>>>>> Probing unaligned accesses on boot is time consuming. Provide a
> >>>>>>>>> function which will be used to look up the access type in a table
> >>>>>>>>> by id registers. Vendors which provide table entries can then skip
> >>>>>>>>> the probing.
> >>>>>>>>
> >>>>>>>> The access checker in my experience is only time consuming on slow
> >>>>>>>> hardware. Hardware that supports fast unaligned accesses isn't really
> >>>>>>>> impacted by this? Avoiding a list of hardware that has slow/fast
> >>>>>>>> unaligned accesses in the kernel was the main reason for dynamically
> >>>>>>>> checking. We did introduce the config option to compile the kernel with
> >>>>>>>> assumed slow/fast accesses, which of course has the downside of
> >>>>>>>> recompiling the kernel and I assume that you already considered that.
> >>>>>>>
> >>>>>>> The kconfig option does not align with the vision of running the same
> >>>>>>> kernel image across platforms.
> >>>>>>
> >>>>>> I'd would be advocating to remove compile time options as well and use
> >>>>>> another way to skip the probe (see below).
> >>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Instead of having a table in the kernel, something that would be more
> >>>>>>>> platform agnostic would be to have an extension that signals this
> >>>>>>>> information. That seems like it would accomplish the same goal and
> >>>>>>>> leverage the existing infrastructure in the kernel, albeit with the need
> >>>>>>>> to make a new extension.
> >>>>>>>>
> >>>>>>>
> >>>>>>> IMO, expecting an ISA extension to be defined for all possible
> >>>>>>> microarchitectural choices is not going to scale so it is better
> >>>>>>> to have infrastructure in kernel itself to infer microarchitectural
> >>>>>>> choices based on RISC-V implementation ID.
> >>>>>>
> >>>>>> Since adding an extension seems quite unlikely, and that a device-tree
> >>>>>> property is likely DT centric and not applicable to ACPI as well, was a
> >>>>>> command line argument considered ?
> >>>>>>
> >>>>>
> >>>>> I did consider adding a command line option in addition to the table,
> >>>>> allowing platforms which neither have a table entry [yet] nor want to do
> >>>>> the speed test, to set whatever they like. In the end, I dropped it, since
> >>>>> I don't have a use case at this time. However, if we really don't want a
> >>>>> table, then I can look into the command line option instead.
> >>>>
> >>>> Sorry if I wasn't clear, I wasn't considering this as a replacement for
> >>>> your table but rather as a replacement to Charlie's compile time define
> >>>> to skip misaligned speed probing since it is like "lpj=<x>". You can
> >>>> specify it on command line if you want to skip the loop time detection
> >>>> of loops per jiffies and have faster boot.
> >>>
> >>> Jesse sent out a patch for a kernel parameter to set the access speed to
> >>> whatever is desired [1].
> >>
> >> Hey Charlie,
> >>
> >> Thanks but it seems you forgot to add the link ?
> >
> > Oops, I frequently do that...
> >
> > https://lore.kernel.org/linux-riscv/20240805173816.3722002-1-jesse@rivosinc.com/
> >
> >>
> >> Having configuration option + command line option seems like something
> >> particularly heavy for such feature. The ifdefery/config options
> >> involved in the misaligned probing code is already quite complicated. If
> >> another mean to specify the misaligned speed access is added, I think
> >> all configuration options to set the speed of accesses can then be
> >> removed and just keep the command line. That will certainly simplify the
> >> ifdef/config options.
> >
> > Yeah that's why it didn't get merged because it felt like overkill. I
> > responded on the thread to Anup as why I would prefer config options. It
> > just comes down to config options being required to enable compiler
> > features. The kernel is only built with rv64gc and usage of all other
> > extensions requires hand written assembly. There are easy performance
> > gains when compiling the kernel with rv64gc_zba_zbb_zbkb etc.
> > Performance focused kernels will need to be recompiled anyway so I am of
> > the opinion that grouping in other performance features as config
> > options like this is the easiest thing to do and reduces the amount of
> > code in the kernel.
>
> As answered on the other thread, totally agree, except for the
> misaligned accesses probing config options ;). Ultimately, we need
> profiles configuration, either via defconfigs that enables a bunch of
> optimization via ISA extension or configuration options that groups
> these config options.

I agree.

Kconfig options for ISA extensions are fine but not for
misaligned access. The kernel command-line option to skip
misaligned access probe is fine as well. I still think RISC-V
implementation ID based microarchitecture feature discovery
will be eventually required as well so why not add it now.

For profiles, how about having an incremental rva23.config
similar to 64-bit.config ? This will minimize the duplication
by re-using the same base defconfig.

Regards,
Anup



>
> Clément
>
> >
> > - Charlie
> >
> >>
> >> Clément
> >>
> >>>
> >>> - Charlie
> >>>
> >>>> -}
> >>>> -#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */
> >>>> -static void __init check_unaligned_access_speed_all_cpus(void)
> >>>> -{
> >>>> -}
> >>>> -#endif
> >>>> -
> >>>>  #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
> >>>>  static void check_vector_unaligned_access(struct work_struct *work __always_unused)
> >>>>  {
> >>>> @@ -370,6 +380,11 @@ static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway
> >>>>  }
> >>>>  #endif
> >>>>
> >>>> +static bool check_vector_unaligned_access_table(void)
> >>>> +{
> >>>> +  return false;
> >>>> +}
> >>>> +
> >>>>  static int riscv_online_cpu_vec(unsigned int cpu)
> >>>>  {
> >>>>    if (!has_vector()) {
> >>>> @@ -377,6 +392,9 @@ static int riscv_online_cpu_vec(unsigned int cpu)
> >>>>            return 0;
> >>>>    }
> >>>>
> >>>> +  if (check_vector_unaligned_access_table())
> >>>> +          return 0;
> >>>> +
> >>>>  #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
> >>>>    if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN)
> >>>>            return 0;
> >>>> @@ -392,13 +410,15 @@ static int __init check_unaligned_access_all_cpus(void)
> >>>>  {
> >>>>    int cpu;
> >>>>
> >>>> -  if (!check_unaligned_access_emulated_all_cpus())
> >>>> +  if (!check_unaligned_access_table() &&
> >>>> +      !check_unaligned_access_emulated_all_cpus())
> >>>>            check_unaligned_access_speed_all_cpus();
> >>>>
> >>>>    if (!has_vector()) {
> >>>>            for_each_online_cpu(cpu)
> >>>>                    per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED;
> >>>> -  } else if (!check_vector_unaligned_access_emulated_all_cpus() &&
> >>>> +  } else if (!check_vector_unaligned_access_table() &&
> >>>> +             !check_vector_unaligned_access_emulated_all_cpus() &&
> >>>>               IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) {
> >>>>            kthread_run(vec_check_unaligned_access_speed_all_cpus,
> >>>>                        NULL, "vec_check_unaligned_access_speed_all_cpus");
> >>>
> >>>>
> >>>> Regarding your table, it feels like a bit going back to old hardcoded
> >>>> platform description ;). I think some kind of auto-detection of speed
> >>>> (not builtin the kernel) for platforms could be good as well to skip
> >>>> probing.
> >>>>
> >>>> A DT property also seems ok to me since the goal is to describe
> >>>> hardware. Would a common DT/ACPI property be appropriate ? The
> >>>> device_property API unified both so if we used some common property to
> >>>> describe the misaligned access speed (both in DT cpu node/ ACPI CPU
> >>>> device package), we could keep a single parsing method. But I'm no ACPI
> >>>> expert so I don't know if that really make sense.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Clément
> >>>>
> >>>>>
> >>>>> Thanks,
> >>>>> drew
> >>>>
> >>
>
Clément Léger Feb. 11, 2025, 8:37 a.m. UTC | #16
On 11/02/2025 05:26, Anup Patel wrote:
> On Tue, Feb 11, 2025 at 2:27 AM Clément Léger <cleger@rivosinc.com> wrote:
>>
>>
>>
>> On 10/02/2025 21:53, Charlie Jenkins wrote:
>>> On Mon, Feb 10, 2025 at 09:42:25PM +0100, Clément Léger wrote:
>>>>
>>>>
>>>> On 10/02/2025 18:20, Charlie Jenkins wrote:
>>>>> On Mon, Feb 10, 2025 at 03:20:34PM +0100, Clément Léger wrote:
>>>>>>
>>>>>>
>>>>>> On 10/02/2025 15:06, Andrew Jones wrote:
>>>>>>> On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/02/2025 11:16, Anup Patel wrote:
>>>>>>>>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote:
>>>>>>>>>>> Probing unaligned accesses on boot is time consuming. Provide a
>>>>>>>>>>> function which will be used to look up the access type in a table
>>>>>>>>>>> by id registers. Vendors which provide table entries can then skip
>>>>>>>>>>> the probing.
>>>>>>>>>>
>>>>>>>>>> The access checker in my experience is only time consuming on slow
>>>>>>>>>> hardware. Hardware that supports fast unaligned accesses isn't really
>>>>>>>>>> impacted by this? Avoiding a list of hardware that has slow/fast
>>>>>>>>>> unaligned accesses in the kernel was the main reason for dynamically
>>>>>>>>>> checking. We did introduce the config option to compile the kernel with
>>>>>>>>>> assumed slow/fast accesses, which of course has the downside of
>>>>>>>>>> recompiling the kernel and I assume that you already considered that.
>>>>>>>>>
>>>>>>>>> The kconfig option does not align with the vision of running the same
>>>>>>>>> kernel image across platforms.
>>>>>>>>
>>>>>>>> I'd would be advocating to remove compile time options as well and use
>>>>>>>> another way to skip the probe (see below).
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Instead of having a table in the kernel, something that would be more
>>>>>>>>>> platform agnostic would be to have an extension that signals this
>>>>>>>>>> information. That seems like it would accomplish the same goal and
>>>>>>>>>> leverage the existing infrastructure in the kernel, albeit with the need
>>>>>>>>>> to make a new extension.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> IMO, expecting an ISA extension to be defined for all possible
>>>>>>>>> microarchitectural choices is not going to scale so it is better
>>>>>>>>> to have infrastructure in kernel itself to infer microarchitectural
>>>>>>>>> choices based on RISC-V implementation ID.
>>>>>>>>
>>>>>>>> Since adding an extension seems quite unlikely, and that a device-tree
>>>>>>>> property is likely DT centric and not applicable to ACPI as well, was a
>>>>>>>> command line argument considered ?
>>>>>>>>
>>>>>>>
>>>>>>> I did consider adding a command line option in addition to the table,
>>>>>>> allowing platforms which neither have a table entry [yet] nor want to do
>>>>>>> the speed test, to set whatever they like. In the end, I dropped it, since
>>>>>>> I don't have a use case at this time. However, if we really don't want a
>>>>>>> table, then I can look into the command line option instead.
>>>>>>
>>>>>> Sorry if I wasn't clear, I wasn't considering this as a replacement for
>>>>>> your table but rather as a replacement to Charlie's compile time define
>>>>>> to skip misaligned speed probing since it is like "lpj=<x>". You can
>>>>>> specify it on command line if you want to skip the loop time detection
>>>>>> of loops per jiffies and have faster boot.
>>>>>
>>>>> Jesse sent out a patch for a kernel parameter to set the access speed to
>>>>> whatever is desired [1].
>>>>
>>>> Hey Charlie,
>>>>
>>>> Thanks but it seems you forgot to add the link ?
>>>
>>> Oops, I frequently do that...
>>>
>>> https://lore.kernel.org/linux-riscv/20240805173816.3722002-1-jesse@rivosinc.com/
>>>
>>>>
>>>> Having configuration option + command line option seems like something
>>>> particularly heavy for such feature. The ifdefery/config options
>>>> involved in the misaligned probing code is already quite complicated. If
>>>> another mean to specify the misaligned speed access is added, I think
>>>> all configuration options to set the speed of accesses can then be
>>>> removed and just keep the command line. That will certainly simplify the
>>>> ifdef/config options.
>>>
>>> Yeah that's why it didn't get merged because it felt like overkill. I
>>> responded on the thread to Anup as why I would prefer config options. It
>>> just comes down to config options being required to enable compiler
>>> features. The kernel is only built with rv64gc and usage of all other
>>> extensions requires hand written assembly. There are easy performance
>>> gains when compiling the kernel with rv64gc_zba_zbb_zbkb etc.
>>> Performance focused kernels will need to be recompiled anyway so I am of
>>> the opinion that grouping in other performance features as config
>>> options like this is the easiest thing to do and reduces the amount of
>>> code in the kernel.
>>
>> As answered on the other thread, totally agree, except for the
>> misaligned accesses probing config options ;). Ultimately, we need
>> profiles configuration, either via defconfigs that enables a bunch of
>> optimization via ISA extension or configuration options that groups
>> these config options.
> 
> I agree.
> 
> Kconfig options for ISA extensions are fine but not for
> misaligned access. The kernel command-line option to skip
> misaligned access probe is fine as well. I still think RISC-V
> implementation ID based microarchitecture feature discovery
> will be eventually required as well so why not add it now.

As said by Charlie, keeping an internal list of IDs that needs to be
updated/submitted to add new ones is quite a burden to maintain. IMHO,
this would be better to be put in ACPI or DT description since it purely
describe hardware behavior.

But you said it will be "eventually required" so you might have some
specific use-case in mind ?

> 
> For profiles, how about having an incremental rva23.config
> similar to 64-bit.config ? This will minimize the duplication
> by re-using the same base defconfig.

.config or a CONFIG_PROFILE option that allow to select the profile for
to be supported (CONFIG_PROFILE_RVA23). Using profile config options
might be better though since we can have better dependencies/select
between profiles rather than duplicating them in various profile
.config. So I might be in favor of a CONFIG_PROFILE rather than a
.config per profile.

BTW, that's not totally exclusive, if we want to add a rva23.config that
contains CONFIG_PROFILE_RVA23=y, that works as well, although a bit
overkill ;).

Regards,

Clément

> 
> Regards,
> Anup
> 
> 
> 
>>
>> Clément
>>
>>>
>>> - Charlie
>>>
>>>>
>>>> Clément
>>>>
>>>>>
>>>>> - Charlie
>>>>>
>>>>>> -}
>>>>>> -#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */
>>>>>> -static void __init check_unaligned_access_speed_all_cpus(void)
>>>>>> -{
>>>>>> -}
>>>>>> -#endif
>>>>>> -
>>>>>>  #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
>>>>>>  static void check_vector_unaligned_access(struct work_struct *work __always_unused)
>>>>>>  {
>>>>>> @@ -370,6 +380,11 @@ static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway
>>>>>>  }
>>>>>>  #endif
>>>>>>
>>>>>> +static bool check_vector_unaligned_access_table(void)
>>>>>> +{
>>>>>> +  return false;
>>>>>> +}
>>>>>> +
>>>>>>  static int riscv_online_cpu_vec(unsigned int cpu)
>>>>>>  {
>>>>>>    if (!has_vector()) {
>>>>>> @@ -377,6 +392,9 @@ static int riscv_online_cpu_vec(unsigned int cpu)
>>>>>>            return 0;
>>>>>>    }
>>>>>>
>>>>>> +  if (check_vector_unaligned_access_table())
>>>>>> +          return 0;
>>>>>> +
>>>>>>  #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
>>>>>>    if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN)
>>>>>>            return 0;
>>>>>> @@ -392,13 +410,15 @@ static int __init check_unaligned_access_all_cpus(void)
>>>>>>  {
>>>>>>    int cpu;
>>>>>>
>>>>>> -  if (!check_unaligned_access_emulated_all_cpus())
>>>>>> +  if (!check_unaligned_access_table() &&
>>>>>> +      !check_unaligned_access_emulated_all_cpus())
>>>>>>            check_unaligned_access_speed_all_cpus();
>>>>>>
>>>>>>    if (!has_vector()) {
>>>>>>            for_each_online_cpu(cpu)
>>>>>>                    per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED;
>>>>>> -  } else if (!check_vector_unaligned_access_emulated_all_cpus() &&
>>>>>> +  } else if (!check_vector_unaligned_access_table() &&
>>>>>> +             !check_vector_unaligned_access_emulated_all_cpus() &&
>>>>>>               IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) {
>>>>>>            kthread_run(vec_check_unaligned_access_speed_all_cpus,
>>>>>>                        NULL, "vec_check_unaligned_access_speed_all_cpus");
>>>>>
>>>>>>
>>>>>> Regarding your table, it feels like a bit going back to old hardcoded
>>>>>> platform description ;). I think some kind of auto-detection of speed
>>>>>> (not builtin the kernel) for platforms could be good as well to skip
>>>>>> probing.
>>>>>>
>>>>>> A DT property also seems ok to me since the goal is to describe
>>>>>> hardware. Would a common DT/ACPI property be appropriate ? The
>>>>>> device_property API unified both so if we used some common property to
>>>>>> describe the misaligned access speed (both in DT cpu node/ ACPI CPU
>>>>>> device package), we could keep a single parsing method. But I'm no ACPI
>>>>>> expert so I don't know if that really make sense.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Clément
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> drew
>>>>>>
>>>>
>>
Andrew Jones Feb. 11, 2025, 9:04 a.m. UTC | #17
On Mon, Feb 10, 2025 at 09:37:10PM +0100, Clément Léger wrote:
> 
> 
> On 10/02/2025 18:19, Charlie Jenkins wrote:
> > On Mon, Feb 10, 2025 at 03:46:46PM +0530, Anup Patel wrote:
> >> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
> >>>
> >>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote:
> >>>> Probing unaligned accesses on boot is time consuming. Provide a
> >>>> function which will be used to look up the access type in a table
> >>>> by id registers. Vendors which provide table entries can then skip
> >>>> the probing.
> >>>
> >>> The access checker in my experience is only time consuming on slow
> >>> hardware. Hardware that supports fast unaligned accesses isn't really
> >>> impacted by this? Avoiding a list of hardware that has slow/fast
> >>> unaligned accesses in the kernel was the main reason for dynamically
> >>> checking. We did introduce the config option to compile the kernel with
> >>> assumed slow/fast accesses, which of course has the downside of
> >>> recompiling the kernel and I assume that you already considered that.
> >>
> >> The kconfig option does not align with the vision of running the same
> >> kernel image across platforms.
> > 
> > I just don't think that vision is realistic.
> > 
> > I am a proponent for compile time defines because ri ght now we are
> > catering the kernel to both microcontrollers and for high performance
> > platforms. I am in favor of having a set of configur ations that are
> > ideal for these microcontrollers and a different set for high
> > performance platforms. This is where the RVI profile s would ideally
> > come in, having different configs for different profiles that target low
> > performance/high performance.
> > 
> > Compiler optimizations for extensions are not possib le to do by just
> > having these different methods of selecting at runti me. By enabling
> > extra extensions like the bitmanip extensions during compilation via a
> > config flag we can optimize the entire kernel. It is not possible to
> > push all optimizations off to runtime detection.
> 
> While this might be true for the bitmanip extension and other extensions
> that the compiler can take advantage of, that isn't true for the
> misaligned speed probing code. The only meaningful misaligned access
> configuration option for kernel "speed" optimization is
> RISCV_EFFICIENT_UNALIGNED_ACCESS (which is ironically not easily
> selectable since it is under NON_PORTABLE).
> 
> Currently all the config options that have been added around misaligned
> support "just" allows to get rid of the probing code at compile time and
> set a predefined speed. That does not really improve the kernel
> performance itself, just allows for a faster boot. That could as well be
> supported using a command line option as suggested.

I agree. I'll look into stripping config options and picking up Jesse's
command line option for v2. I'll also drop the table approach of this
series since I don't have a strong enough justification for it over the
command line at this time.

Thanks,
drew

> 
> That being said, I agree that some additional configuration options
> should be added to enable additional extension support at compile time,
> enabling a faster kernel. Having a single image for all hardware is as
> you said not realistic but profiles configuration as you proposed might
> be the key for this support.
> 
> Clément
> 
> > 
> >>
> >>>
> >>> Instead of having a table in the kernel, something that would be more
> >>> platform agnostic would be to have an extension that signals this
> >>> information. That seems like it would accomplish the same goal and
> >>> leverage the existing infrastructure in the kernel, albeit with the need
> >>> to make a new extension.
> >>>
> >>
> >> IMO, expecting an ISA extension to be defined for all possible
> >> microarchitectural choices is not going to scale so it is better
> >> to have infrastructure in kernel itself to infer microarchitectural
> >> choices based on RISC-V implementation ID.
> > 
> > How is keeping tables in the kernel for all microarchitectural details
> > any more scalable than having extensions that do the same thing? I would
> > argue that having it in the kernel is less scalable since it needs to be
> > described for all implementation IDs, and all changes require going
> > through the kernel review process. Dynamic probing avoids these issues.
> > Having an extension has the one-time process of getting the extension
> > into something like a profile, but then anybody could use it without
> > needing a kernel patch.
> > 
> > - Charlie
> > 
> >>
> >> Regards,
> >> Anup
> > 
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>
Palmer Dabbelt Feb. 11, 2025, 6:09 p.m. UTC | #18
On Mon, 10 Feb 2025 20:26:32 PST (-0800), apatel@ventanamicro.com wrote:
> On Tue, Feb 11, 2025 at 2:27 AM Clément Léger <cleger@rivosinc.com> wrote:
>>
>>
>>
>> On 10/02/2025 21:53, Charlie Jenkins wrote:
>> > On Mon, Feb 10, 2025 at 09:42:25PM +0100, Clément Léger wrote:
>> >>
>> >>
>> >> On 10/02/2025 18:20, Charlie Jenkins wrote:
>> >>> On Mon, Feb 10, 2025 at 03:20:34PM +0100, Clément Léger wrote:
>> >>>>
>> >>>>
>> >>>> On 10/02/2025 15:06, Andrew Jones wrote:
>> >>>>> On Mon, Feb 10, 2025 at 12:07:40PM +0100, Clément Léger wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>> On 10/02/2025 11:16, Anup Patel wrote:
>> >>>>>>> On Sat, Feb 8, 2025 at 6:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
>> >>>>>>>>
>> >>>>>>>> On Fri, Feb 07, 2025 at 05:19:47PM +0100, Andrew Jones wrote:
>> >>>>>>>>> Probing unaligned accesses on boot is time consuming. Provide a
>> >>>>>>>>> function which will be used to look up the access type in a table
>> >>>>>>>>> by id registers. Vendors which provide table entries can then skip
>> >>>>>>>>> the probing.
>> >>>>>>>>
>> >>>>>>>> The access checker in my experience is only time consuming on slow
>> >>>>>>>> hardware. Hardware that supports fast unaligned accesses isn't really
>> >>>>>>>> impacted by this? Avoiding a list of hardware that has slow/fast
>> >>>>>>>> unaligned accesses in the kernel was the main reason for dynamically
>> >>>>>>>> checking. We did introduce the config option to compile the kernel with
>> >>>>>>>> assumed slow/fast accesses, which of course has the downside of
>> >>>>>>>> recompiling the kernel and I assume that you already considered that.
>> >>>>>>>
>> >>>>>>> The kconfig option does not align with the vision of running the same
>> >>>>>>> kernel image across platforms.
>> >>>>>>
>> >>>>>> I'd would be advocating to remove compile time options as well and use
>> >>>>>> another way to skip the probe (see below).
>> >>>>>>
>> >>>>>>>
>> >>>>>>>>
>> >>>>>>>> Instead of having a table in the kernel, something that would be more
>> >>>>>>>> platform agnostic would be to have an extension that signals this
>> >>>>>>>> information. That seems like it would accomplish the same goal and
>> >>>>>>>> leverage the existing infrastructure in the kernel, albeit with the need
>> >>>>>>>> to make a new extension.
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>> IMO, expecting an ISA extension to be defined for all possible
>> >>>>>>> microarchitectural choices is not going to scale so it is better
>> >>>>>>> to have infrastructure in kernel itself to infer microarchitectural
>> >>>>>>> choices based on RISC-V implementation ID.
>> >>>>>>
>> >>>>>> Since adding an extension seems quite unlikely, and that a device-tree
>> >>>>>> property is likely DT centric and not applicable to ACPI as well, was a
>> >>>>>> command line argument considered ?
>> >>>>>>
>> >>>>>
>> >>>>> I did consider adding a command line option in addition to the table,
>> >>>>> allowing platforms which neither have a table entry [yet] nor want to do
>> >>>>> the speed test, to set whatever they like. In the end, I dropped it, since
>> >>>>> I don't have a use case at this time. However, if we really don't want a
>> >>>>> table, then I can look into the command line option instead.
>> >>>>
>> >>>> Sorry if I wasn't clear, I wasn't considering this as a replacement for
>> >>>> your table but rather as a replacement to Charlie's compile time define
>> >>>> to skip misaligned speed probing since it is like "lpj=<x>". You can
>> >>>> specify it on command line if you want to skip the loop time detection
>> >>>> of loops per jiffies and have faster boot.
>> >>>
>> >>> Jesse sent out a patch for a kernel parameter to set the access speed to
>> >>> whatever is desired [1].
>> >>
>> >> Hey Charlie,
>> >>
>> >> Thanks but it seems you forgot to add the link ?
>> >
>> > Oops, I frequently do that...
>> >
>> > https://lore.kernel.org/linux-riscv/20240805173816.3722002-1-jesse@rivosinc.com/
>> >
>> >>
>> >> Having configuration option + command line option seems like something
>> >> particularly heavy for such feature. The ifdefery/config options
>> >> involved in the misaligned probing code is already quite complicated. If
>> >> another mean to specify the misaligned speed access is added, I think
>> >> all configuration options to set the speed of accesses can then be
>> >> removed and just keep the command line. That will certainly simplify the
>> >> ifdef/config options.
>> >
>> > Yeah that's why it didn't get merged because it felt like overkill. I
>> > responded on the thread to Anup as why I would prefer config options. It
>> > just comes down to config options being required to enable compiler
>> > features. The kernel is only built with rv64gc and usage of all other
>> > extensions requires hand written assembly. There are easy performance
>> > gains when compiling the kernel with rv64gc_zba_zbb_zbkb etc.
>> > Performance focused kernels will need to be recompiled anyway so I am of
>> > the opinion that grouping in other performance features as config
>> > options like this is the easiest thing to do and reduces the amount of
>> > code in the kernel.
>>
>> As answered on the other thread, totally agree, except for the
>> misaligned accesses probing config options ;). Ultimately, we need
>> profiles configuration, either via defconfigs that enables a bunch of
>> optimization via ISA extension or configuration options that groups
>> these config options.
>
> I agree.
>
> Kconfig options for ISA extensions are fine but not for
> misaligned access. The kernel command-line option to skip
> misaligned access probe is fine as well.

IMO it's reasonable to have a Kconfig option that does something like 
"build a kernel that assumes misaligned accesses are fast and 
supported".  We've got a bunch of other platform-level opt ins along 
those lines (drivers, extensions, etrrata, memory stuff, etc).  It would 
almost certainly be a measurable performance boost to enable misaligned 
accesses when building Linux, it is for every other big codebase.

> I still think RISC-V
> implementation ID based microarchitecture feature discovery
> will be eventually required as well so why not add it now.

Ya, the probing loops are nasty.  We originally tried adding this to the 
DT so we didn't have to probe for it, but that got shot down.  IMO it's 
way cleaner than having tables of vendor IDs in the kernel, maybe if 
that's the backup option it'll convince the DT people to change their 
minds?

> For profiles, how about having an incremental rva23.config
> similar to 64-bit.config ? This will minimize the duplication
> by re-using the same base defconfig.

The profiles don't really tell us anything, though.  They're really just 
marketing material, vendors just ignore the actually requirements and 
stamp whatever they ship as compliant.  If we start adding configs we 
should just go with some sort of "support what shipped in 2024" type 
definitions, that'd let us Kconfig-off some cruft for supporting old 
systems while still being able to actually target something concrete.

>
> Regards,
> Anup
>
>
>
>>
>> Clément
>>
>> >
>> > - Charlie
>> >
>> >>
>> >> Clément
>> >>
>> >>>
>> >>> - Charlie
>> >>>
>> >>>> -}
>> >>>> -#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */
>> >>>> -static void __init check_unaligned_access_speed_all_cpus(void)
>> >>>> -{
>> >>>> -}
>> >>>> -#endif
>> >>>> -
>> >>>>  #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
>> >>>>  static void check_vector_unaligned_access(struct work_struct *work __always_unused)
>> >>>>  {
>> >>>> @@ -370,6 +380,11 @@ static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway
>> >>>>  }
>> >>>>  #endif
>> >>>>
>> >>>> +static bool check_vector_unaligned_access_table(void)
>> >>>> +{
>> >>>> +  return false;
>> >>>> +}
>> >>>> +
>> >>>>  static int riscv_online_cpu_vec(unsigned int cpu)
>> >>>>  {
>> >>>>    if (!has_vector()) {
>> >>>> @@ -377,6 +392,9 @@ static int riscv_online_cpu_vec(unsigned int cpu)
>> >>>>            return 0;
>> >>>>    }
>> >>>>
>> >>>> +  if (check_vector_unaligned_access_table())
>> >>>> +          return 0;
>> >>>> +
>> >>>>  #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
>> >>>>    if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN)
>> >>>>            return 0;
>> >>>> @@ -392,13 +410,15 @@ static int __init check_unaligned_access_all_cpus(void)
>> >>>>  {
>> >>>>    int cpu;
>> >>>>
>> >>>> -  if (!check_unaligned_access_emulated_all_cpus())
>> >>>> +  if (!check_unaligned_access_table() &&
>> >>>> +      !check_unaligned_access_emulated_all_cpus())
>> >>>>            check_unaligned_access_speed_all_cpus();
>> >>>>
>> >>>>    if (!has_vector()) {
>> >>>>            for_each_online_cpu(cpu)
>> >>>>                    per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED;
>> >>>> -  } else if (!check_vector_unaligned_access_emulated_all_cpus() &&
>> >>>> +  } else if (!check_vector_unaligned_access_table() &&
>> >>>> +             !check_vector_unaligned_access_emulated_all_cpus() &&
>> >>>>               IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) {
>> >>>>            kthread_run(vec_check_unaligned_access_speed_all_cpus,
>> >>>>                        NULL, "vec_check_unaligned_access_speed_all_cpus");
>> >>>
>> >>>>
>> >>>> Regarding your table, it feels like a bit going back to old hardcoded
>> >>>> platform description ;). I think some kind of auto-detection of speed
>> >>>> (not builtin the kernel) for platforms could be good as well to skip
>> >>>> probing.
>> >>>>
>> >>>> A DT property also seems ok to me since the goal is to describe
>> >>>> hardware. Would a common DT/ACPI property be appropriate ? The
>> >>>> device_property API unified both so if we used some common property to
>> >>>> describe the misaligned access speed (both in DT cpu node/ ACPI CPU
>> >>>> device package), we could keep a single parsing method. But I'm no ACPI
>> >>>> expert so I don't know if that really make sense.
>> >>>>
>> >>>> Thanks,
>> >>>>
>> >>>> Clément
>> >>>>
>> >>>>>
>> >>>>> Thanks,
>> >>>>> drew
>> >>>>
>> >>
>>
diff mbox series

Patch

diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index d9d4ca1fadc7..f8497097e79d 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -130,6 +130,50 @@  static void __init check_unaligned_access_nonboot_cpu(void *param)
 		check_unaligned_access(pages[cpu]);
 }
 
+/* Measure unaligned access speed on all CPUs present at boot in parallel. */
+static void __init check_unaligned_access_speed_all_cpus(void)
+{
+	unsigned int cpu;
+	unsigned int cpu_count = num_possible_cpus();
+	struct page **bufs = kcalloc(cpu_count, sizeof(*bufs), GFP_KERNEL);
+
+	if (!bufs) {
+		pr_warn("Allocation failure, not measuring misaligned performance\n");
+		return;
+	}
+
+	/*
+	 * Allocate separate buffers for each CPU so there's no fighting over
+	 * cache lines.
+	 */
+	for_each_cpu(cpu, cpu_online_mask) {
+		bufs[cpu] = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
+		if (!bufs[cpu]) {
+			pr_warn("Allocation failure, not measuring misaligned performance\n");
+			goto out;
+		}
+	}
+
+	/* Check everybody except 0, who stays behind to tend jiffies. */
+	on_each_cpu(check_unaligned_access_nonboot_cpu, bufs, 1);
+
+	/* Check core 0. */
+	smp_call_on_cpu(0, check_unaligned_access, bufs[0], true);
+
+out:
+	for_each_cpu(cpu, cpu_online_mask) {
+		if (bufs[cpu])
+			__free_pages(bufs[cpu], MISALIGNED_BUFFER_ORDER);
+	}
+
+	kfree(bufs);
+}
+#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */
+static void __init check_unaligned_access_speed_all_cpus(void)
+{
+}
+#endif
+
 DEFINE_STATIC_KEY_FALSE(fast_unaligned_access_speed_key);
 
 static void modify_unaligned_access_branches(cpumask_t *mask, int weight)
@@ -186,8 +230,17 @@  static int __init lock_and_set_unaligned_access_static_branch(void)
 
 arch_initcall_sync(lock_and_set_unaligned_access_static_branch);
 
+static bool check_unaligned_access_table(void)
+{
+	return false;
+}
+
 static int riscv_online_cpu(unsigned int cpu)
 {
+	if (check_unaligned_access_table())
+		goto exit;
+
+#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
 	static struct page *buf;
 
 	/* We are already set since the last check */
@@ -203,6 +256,7 @@  static int riscv_online_cpu(unsigned int cpu)
 
 	check_unaligned_access(buf);
 	__free_pages(buf, MISALIGNED_BUFFER_ORDER);
+#endif
 
 exit:
 	set_unaligned_access_static_branches();
@@ -217,50 +271,6 @@  static int riscv_offline_cpu(unsigned int cpu)
 	return 0;
 }
 
-/* Measure unaligned access speed on all CPUs present at boot in parallel. */
-static void __init check_unaligned_access_speed_all_cpus(void)
-{
-	unsigned int cpu;
-	unsigned int cpu_count = num_possible_cpus();
-	struct page **bufs = kcalloc(cpu_count, sizeof(*bufs), GFP_KERNEL);
-
-	if (!bufs) {
-		pr_warn("Allocation failure, not measuring misaligned performance\n");
-		return;
-	}
-
-	/*
-	 * Allocate separate buffers for each CPU so there's no fighting over
-	 * cache lines.
-	 */
-	for_each_cpu(cpu, cpu_online_mask) {
-		bufs[cpu] = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
-		if (!bufs[cpu]) {
-			pr_warn("Allocation failure, not measuring misaligned performance\n");
-			goto out;
-		}
-	}
-
-	/* Check everybody except 0, who stays behind to tend jiffies. */
-	on_each_cpu(check_unaligned_access_nonboot_cpu, bufs, 1);
-
-	/* Check core 0. */
-	smp_call_on_cpu(0, check_unaligned_access, bufs[0], true);
-
-out:
-	for_each_cpu(cpu, cpu_online_mask) {
-		if (bufs[cpu])
-			__free_pages(bufs[cpu], MISALIGNED_BUFFER_ORDER);
-	}
-
-	kfree(bufs);
-}
-#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */
-static void __init check_unaligned_access_speed_all_cpus(void)
-{
-}
-#endif
-
 #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
 static void check_vector_unaligned_access(struct work_struct *work __always_unused)
 {
@@ -370,6 +380,11 @@  static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway
 }
 #endif
 
+static bool check_vector_unaligned_access_table(void)
+{
+	return false;
+}
+
 static int riscv_online_cpu_vec(unsigned int cpu)
 {
 	if (!has_vector()) {
@@ -377,6 +392,9 @@  static int riscv_online_cpu_vec(unsigned int cpu)
 		return 0;
 	}
 
+	if (check_vector_unaligned_access_table())
+		return 0;
+
 #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
 	if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN)
 		return 0;
@@ -392,13 +410,15 @@  static int __init check_unaligned_access_all_cpus(void)
 {
 	int cpu;
 
-	if (!check_unaligned_access_emulated_all_cpus())
+	if (!check_unaligned_access_table() &&
+	    !check_unaligned_access_emulated_all_cpus())
 		check_unaligned_access_speed_all_cpus();
 
 	if (!has_vector()) {
 		for_each_online_cpu(cpu)
 			per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED;
-	} else if (!check_vector_unaligned_access_emulated_all_cpus() &&
+	} else if (!check_vector_unaligned_access_table() &&
+		   !check_vector_unaligned_access_emulated_all_cpus() &&
 		   IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) {
 		kthread_run(vec_check_unaligned_access_speed_all_cpus,
 			    NULL, "vec_check_unaligned_access_speed_all_cpus");
@@ -408,10 +428,8 @@  static int __init check_unaligned_access_all_cpus(void)
 	 * Setup hotplug callbacks for any new CPUs that come online or go
 	 * offline.
 	 */
-#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
 	cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "riscv:online",
 				  riscv_online_cpu, riscv_offline_cpu);
-#endif
 	cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "riscv:online",
 				  riscv_online_cpu_vec, NULL);