diff mbox series

[02/19] riscv: cpufeature: Fix thead vector hwcap removal

Message ID 20240411-dev-charlie-support_thead_vector_6_9-v1-2-4af9815ec746@rivosinc.com (mailing list archive)
State New, archived
Headers show
Series riscv: Support vendor extensions and xtheadvector | expand

Commit Message

Charlie Jenkins April 12, 2024, 4:11 a.m. UTC
The riscv_cpuinfo struct that contains mvendorid and marchid is not
populated until all harts are booted which happens after the DT parsing.
Use the vendorid/archid values from the DT if available or assume all
harts have the same values as the boot hart as a fallback.

Fixes: d82f32202e0d ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs")
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 arch/riscv/include/asm/sbi.h   |  2 ++
 arch/riscv/kernel/cpu.c        | 20 ++++++++++++++++++++
 arch/riscv/kernel/cpufeature.c | 22 ++++++++++++++++++++--
 3 files changed, 42 insertions(+), 2 deletions(-)

Comments

Conor Dooley April 12, 2024, 10:25 a.m. UTC | #1
On Thu, Apr 11, 2024 at 09:11:08PM -0700, Charlie Jenkins wrote:
> The riscv_cpuinfo struct that contains mvendorid and marchid is not
> populated until all harts are booted which happens after the DT parsing.
> Use the vendorid/archid values from the DT if available or assume all
> harts have the same values as the boot hart as a fallback.
> 
> Fixes: d82f32202e0d ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs")

If this is our only use case for getting the mvendorid/marchid stuff
from dt, then I don't think we should add it. None of the devicetrees
that the commit you're fixing here addresses will have these properties
and if they did have them, they'd then also be new enough to hopefully
not have "v" either - the issue is they're using whatever crap the
vendor shipped.
If we're gonna get the information from DT, we already have something
that we can look at to perform the disable as the cpu compatibles give
us enough information to make the decision.

I also think that we could just cache the boot CPU's marchid/mvendorid,
since we already have to look at it in riscv_fill_cpu_mfr_info(), avoid
repeating these ecalls on all systems.

Perhaps for now we could just look at the boot CPU alone? To my
knowledge the systems that this targets all have homogeneous
marchid/mvendorid values of 0x0.

> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>

> @@ -514,12 +521,23 @@ static void __init riscv_fill_hwcap_from_isa_string(unsigned long *isa2hwcap)
>  				pr_warn("Unable to find \"riscv,isa\" devicetree entry\n");
>  				continue;
>  			}
> +			if (of_property_read_u64(node, "riscv,vendorid", &this_vendorid) < 0) {
> +				pr_warn("Unable to find \"riscv,vendorid\" devicetree entry, using boot hart mvendorid instead\n");

This should 100% not be a warning, it's not a required property in the
binding.

Cheers,
Conor.

> +				this_vendorid = boot_vendorid;
> +			}
Evan Green April 12, 2024, 5:04 p.m. UTC | #2
On Fri, Apr 12, 2024 at 3:26 AM Conor Dooley <conor.dooley@microchip.com> wrote:
>
> On Thu, Apr 11, 2024 at 09:11:08PM -0700, Charlie Jenkins wrote:
> > The riscv_cpuinfo struct that contains mvendorid and marchid is not
> > populated until all harts are booted which happens after the DT parsing.
> > Use the vendorid/archid values from the DT if available or assume all
> > harts have the same values as the boot hart as a fallback.
> >
> > Fixes: d82f32202e0d ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs")
>
> If this is our only use case for getting the mvendorid/marchid stuff
> from dt, then I don't think we should add it. None of the devicetrees
> that the commit you're fixing here addresses will have these properties
> and if they did have them, they'd then also be new enough to hopefully
> not have "v" either - the issue is they're using whatever crap the
> vendor shipped.
> If we're gonna get the information from DT, we already have something
> that we can look at to perform the disable as the cpu compatibles give
> us enough information to make the decision.
>
> I also think that we could just cache the boot CPU's marchid/mvendorid,
> since we already have to look at it in riscv_fill_cpu_mfr_info(), avoid
> repeating these ecalls on all systems.
>
> Perhaps for now we could just look at the boot CPU alone? To my
> knowledge the systems that this targets all have homogeneous
> marchid/mvendorid values of 0x0.

It's possible I'm misinterpreting, but is the suggestion to apply the
marchid/mvendorid we find on the boot CPU and assume it's the same on
all other CPUs? Since we're reporting the marchid/mvendorid/mimpid to
usermode in a per-hart way, it would be better IMO if we really do
query marchid/mvendorid/mimpid on each hart. The problem with applying
the boot CPU's value everywhere is if we're ever wrong in the future
(ie that assumption doesn't hold on some machine), we'll only find out
about it after the fact. Since we reported the wrong information to
usermode via hwprobe, it'll be an ugly userspace ABI issue to clean
up.

-Evan
Charlie Jenkins April 12, 2024, 5:12 p.m. UTC | #3
On Fri, Apr 12, 2024 at 11:25:47AM +0100, Conor Dooley wrote:
> On Thu, Apr 11, 2024 at 09:11:08PM -0700, Charlie Jenkins wrote:
> > The riscv_cpuinfo struct that contains mvendorid and marchid is not
> > populated until all harts are booted which happens after the DT parsing.
> > Use the vendorid/archid values from the DT if available or assume all
> > harts have the same values as the boot hart as a fallback.
> > 
> > Fixes: d82f32202e0d ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs")
> 
> If this is our only use case for getting the mvendorid/marchid stuff
> from dt, then I don't think we should add it. None of the devicetrees
> that the commit you're fixing here addresses will have these properties
> and if they did have them, they'd then also be new enough to hopefully
> not have "v" either - the issue is they're using whatever crap the
> vendor shipped.

Yes, the DT those shipped with will not have the property in the DT so
will fall back on the boot hart. The addition of the DT properties allow
future heterogenous systems to be able to function.

> If we're gonna get the information from DT, we already have something
> that we can look at to perform the disable as the cpu compatibles give
> us enough information to make the decision.
> 
> I also think that we could just cache the boot CPU's marchid/mvendorid,
> since we already have to look at it in riscv_fill_cpu_mfr_info(), avoid
> repeating these ecalls on all systems.

Yeah that is a minor optimization that can I can apply.

> 
> Perhaps for now we could just look at the boot CPU alone? To my
> knowledge the systems that this targets all have homogeneous
> marchid/mvendorid values of 0x0.

They have an mvendorid of 0x5b7.

This is already falling back on the boot CPU, but that is not a solution
that scales. Even though all systems currently have homogenous
marchid/mvendorid I am hesitant to assert that all systems are
homogenous without providing an option to override this. The overhead is
looking for a field in the DT which does not seem to be impactful enough
to prevent the addition of this option.

> 
> > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> 
> > @@ -514,12 +521,23 @@ static void __init riscv_fill_hwcap_from_isa_string(unsigned long *isa2hwcap)
> >  				pr_warn("Unable to find \"riscv,isa\" devicetree entry\n");
> >  				continue;
> >  			}
> > +			if (of_property_read_u64(node, "riscv,vendorid", &this_vendorid) < 0) {
> > +				pr_warn("Unable to find \"riscv,vendorid\" devicetree entry, using boot hart mvendorid instead\n");
> 
> This should 100% not be a warning, it's not a required property in the
> binding.

Yes definitely, thank you.

- Charlie

> 
> Cheers,
> Conor.
> 
> > +				this_vendorid = boot_vendorid;
> > +			}
>
Conor Dooley April 12, 2024, 6:38 p.m. UTC | #4
On Fri, Apr 12, 2024 at 10:04:17AM -0700, Evan Green wrote:
> On Fri, Apr 12, 2024 at 3:26 AM Conor Dooley <conor.dooley@microchip.com> wrote:
> >
> > On Thu, Apr 11, 2024 at 09:11:08PM -0700, Charlie Jenkins wrote:
> > > The riscv_cpuinfo struct that contains mvendorid and marchid is not
> > > populated until all harts are booted which happens after the DT parsing.
> > > Use the vendorid/archid values from the DT if available or assume all
> > > harts have the same values as the boot hart as a fallback.
> > >
> > > Fixes: d82f32202e0d ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs")
> >
> > If this is our only use case for getting the mvendorid/marchid stuff
> > from dt, then I don't think we should add it. None of the devicetrees
> > that the commit you're fixing here addresses will have these properties
> > and if they did have them, they'd then also be new enough to hopefully
> > not have "v" either - the issue is they're using whatever crap the
> > vendor shipped.
> > If we're gonna get the information from DT, we already have something
> > that we can look at to perform the disable as the cpu compatibles give
> > us enough information to make the decision.
> >
> > I also think that we could just cache the boot CPU's marchid/mvendorid,
> > since we already have to look at it in riscv_fill_cpu_mfr_info(), avoid
> > repeating these ecalls on all systems.
> >
> > Perhaps for now we could just look at the boot CPU alone? To my
> > knowledge the systems that this targets all have homogeneous
> > marchid/mvendorid values of 0x0.
> 
> It's possible I'm misinterpreting, but is the suggestion to apply the
> marchid/mvendorid we find on the boot CPU and assume it's the same on
> all other CPUs? Since we're reporting the marchid/mvendorid/mimpid to
> usermode in a per-hart way, it would be better IMO if we really do
> query marchid/mvendorid/mimpid on each hart. The problem with applying
> the boot CPU's value everywhere is if we're ever wrong in the future
> (ie that assumption doesn't hold on some machine), we'll only find out
> about it after the fact. Since we reported the wrong information to
> usermode via hwprobe, it'll be an ugly userspace ABI issue to clean
> up.

You're misinterpreting, we do get the values on all individually as
they're brought online. This is only used by the code that throws a bone
to people with crappy vendor dtbs that put "v" in riscv,isa when they
support the unratified version.
Charlie Jenkins April 12, 2024, 6:46 p.m. UTC | #5
On Fri, Apr 12, 2024 at 07:38:04PM +0100, Conor Dooley wrote:
> On Fri, Apr 12, 2024 at 10:04:17AM -0700, Evan Green wrote:
> > On Fri, Apr 12, 2024 at 3:26 AM Conor Dooley <conor.dooley@microchip.com> wrote:
> > >
> > > On Thu, Apr 11, 2024 at 09:11:08PM -0700, Charlie Jenkins wrote:
> > > > The riscv_cpuinfo struct that contains mvendorid and marchid is not
> > > > populated until all harts are booted which happens after the DT parsing.
> > > > Use the vendorid/archid values from the DT if available or assume all
> > > > harts have the same values as the boot hart as a fallback.
> > > >
> > > > Fixes: d82f32202e0d ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs")
> > >
> > > If this is our only use case for getting the mvendorid/marchid stuff
> > > from dt, then I don't think we should add it. None of the devicetrees
> > > that the commit you're fixing here addresses will have these properties
> > > and if they did have them, they'd then also be new enough to hopefully
> > > not have "v" either - the issue is they're using whatever crap the
> > > vendor shipped.
> > > If we're gonna get the information from DT, we already have something
> > > that we can look at to perform the disable as the cpu compatibles give
> > > us enough information to make the decision.
> > >
> > > I also think that we could just cache the boot CPU's marchid/mvendorid,
> > > since we already have to look at it in riscv_fill_cpu_mfr_info(), avoid
> > > repeating these ecalls on all systems.
> > >
> > > Perhaps for now we could just look at the boot CPU alone? To my
> > > knowledge the systems that this targets all have homogeneous
> > > marchid/mvendorid values of 0x0.
> > 
> > It's possible I'm misinterpreting, but is the suggestion to apply the
> > marchid/mvendorid we find on the boot CPU and assume it's the same on
> > all other CPUs? Since we're reporting the marchid/mvendorid/mimpid to
> > usermode in a per-hart way, it would be better IMO if we really do
> > query marchid/mvendorid/mimpid on each hart. The problem with applying
> > the boot CPU's value everywhere is if we're ever wrong in the future
> > (ie that assumption doesn't hold on some machine), we'll only find out
> > about it after the fact. Since we reported the wrong information to
> > usermode via hwprobe, it'll be an ugly userspace ABI issue to clean
> > up.
> 
> You're misinterpreting, we do get the values on all individually as
> they're brought online. This is only used by the code that throws a bone
> to people with crappy vendor dtbs that put "v" in riscv,isa when they
> support the unratified version.

Not quite, the alternatives are patched before the other cpus are
booted, so the alternatives will have false positives resulting in
broken kernels.

- Charlie
Conor Dooley April 12, 2024, 6:47 p.m. UTC | #6
On Fri, Apr 12, 2024 at 10:12:20AM -0700, Charlie Jenkins wrote:
> On Fri, Apr 12, 2024 at 11:25:47AM +0100, Conor Dooley wrote:
> > On Thu, Apr 11, 2024 at 09:11:08PM -0700, Charlie Jenkins wrote:
> > > The riscv_cpuinfo struct that contains mvendorid and marchid is not
> > > populated until all harts are booted which happens after the DT parsing.
> > > Use the vendorid/archid values from the DT if available or assume all
> > > harts have the same values as the boot hart as a fallback.
> > > 
> > > Fixes: d82f32202e0d ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs")
> > 
> > If this is our only use case for getting the mvendorid/marchid stuff
> > from dt, then I don't think we should add it. None of the devicetrees
> > that the commit you're fixing here addresses will have these properties
> > and if they did have them, they'd then also be new enough to hopefully
> > not have "v" either - the issue is they're using whatever crap the
> > vendor shipped.
> 
> Yes, the DT those shipped with will not have the property in the DT so
> will fall back on the boot hart. The addition of the DT properties allow
> future heterogenous systems to be able to function.

I think you've kinda missed the point about what the original code was
actually doing here. Really the kernel should not be doing validation of
the devicetree at all, but I was trying to avoid people shooting
themselves in the foot by doing something simple that would work for
their (incorrect) vendor dtbs.
Future heterogenous systems should be using riscv,isa-extensions, which
is totally unaffected by this codepath (and setting actual values for
mimpid/marchid too ideally!).

> > If we're gonna get the information from DT, we already have something
> > that we can look at to perform the disable as the cpu compatibles give
> > us enough information to make the decision.
> > 
> > I also think that we could just cache the boot CPU's marchid/mvendorid,
> > since we already have to look at it in riscv_fill_cpu_mfr_info(), avoid
> > repeating these ecalls on all systems.
> 
> Yeah that is a minor optimization that can I can apply.
> 
> > 
> > Perhaps for now we could just look at the boot CPU alone? To my
> > knowledge the systems that this targets all have homogeneous
> > marchid/mvendorid values of 0x0.
> 
> They have an mvendorid of 0x5b7.

That was a braino, clearly I should have typed "mimpid".

> This is already falling back on the boot CPU, but that is not a solution
> that scales. Even though all systems currently have homogenous
> marchid/mvendorid I am hesitant to assert that all systems are
> homogenous without providing an option to override this.

There are already is an option. Use the non-deprecated property in your
new system for describing what extesions you support. We don't need to
add any more properties (for now at least).

> The overhead is
> looking for a field in the DT which does not seem to be impactful enough
> to prevent the addition of this option.
> 
> > 
> > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > 
> > > @@ -514,12 +521,23 @@ static void __init riscv_fill_hwcap_from_isa_string(unsigned long *isa2hwcap)
> > >  				pr_warn("Unable to find \"riscv,isa\" devicetree entry\n");
> > >  				continue;
> > >  			}
> > > +			if (of_property_read_u64(node, "riscv,vendorid", &this_vendorid) < 0) {
> > > +				pr_warn("Unable to find \"riscv,vendorid\" devicetree entry, using boot hart mvendorid instead\n");
> > 
> > This should 100% not be a warning, it's not a required property in the
> > binding.
> 
> Yes definitely, thank you.
> 
> - Charlie
> 
> > 
> > Cheers,
> > Conor.
> > 
> > > +				this_vendorid = boot_vendorid;
> > > +			}
> > 
> 
>
Conor Dooley April 12, 2024, 7:26 p.m. UTC | #7
On Fri, Apr 12, 2024 at 11:46:21AM -0700, Charlie Jenkins wrote:
> On Fri, Apr 12, 2024 at 07:38:04PM +0100, Conor Dooley wrote:
> > On Fri, Apr 12, 2024 at 10:04:17AM -0700, Evan Green wrote:
> > > On Fri, Apr 12, 2024 at 3:26 AM Conor Dooley <conor.dooley@microchip.com> wrote:
> > > >
> > > > On Thu, Apr 11, 2024 at 09:11:08PM -0700, Charlie Jenkins wrote:
> > > > > The riscv_cpuinfo struct that contains mvendorid and marchid is not
> > > > > populated until all harts are booted which happens after the DT parsing.
> > > > > Use the vendorid/archid values from the DT if available or assume all
> > > > > harts have the same values as the boot hart as a fallback.
> > > > >
> > > > > Fixes: d82f32202e0d ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs")
> > > >
> > > > If this is our only use case for getting the mvendorid/marchid stuff
> > > > from dt, then I don't think we should add it. None of the devicetrees
> > > > that the commit you're fixing here addresses will have these properties
> > > > and if they did have them, they'd then also be new enough to hopefully
> > > > not have "v" either - the issue is they're using whatever crap the
> > > > vendor shipped.
> > > > If we're gonna get the information from DT, we already have something
> > > > that we can look at to perform the disable as the cpu compatibles give
> > > > us enough information to make the decision.
> > > >
> > > > I also think that we could just cache the boot CPU's marchid/mvendorid,
> > > > since we already have to look at it in riscv_fill_cpu_mfr_info(), avoid
> > > > repeating these ecalls on all systems.
> > > >
> > > > Perhaps for now we could just look at the boot CPU alone? To my
> > > > knowledge the systems that this targets all have homogeneous
> > > > marchid/mvendorid values of 0x0.
> > > 
> > > It's possible I'm misinterpreting, but is the suggestion to apply the
> > > marchid/mvendorid we find on the boot CPU and assume it's the same on
> > > all other CPUs? Since we're reporting the marchid/mvendorid/mimpid to
> > > usermode in a per-hart way, it would be better IMO if we really do
> > > query marchid/mvendorid/mimpid on each hart. The problem with applying
> > > the boot CPU's value everywhere is if we're ever wrong in the future
> > > (ie that assumption doesn't hold on some machine), we'll only find out
> > > about it after the fact. Since we reported the wrong information to
> > > usermode via hwprobe, it'll be an ugly userspace ABI issue to clean
> > > up.
> > 
> > You're misinterpreting, we do get the values on all individually as
> > they're brought online. This is only used by the code that throws a bone
> > to people with crappy vendor dtbs that put "v" in riscv,isa when they
> > support the unratified version.
> 
> Not quite,

Remember that this patch stands in isolation and the justification given
in your commit message does not mention anything other than fixing my
broken patch.

> the alternatives are patched before the other cpus are
> booted, so the alternatives will have false positives resulting in
> broken kernels.

Over-eagerly disabling vector isn't going to break any kernels and
really should not break a behaving userspace either.
Under-eagerly disabling it (in a way that this approach could solve) is
only going to happen on a system where the boot hart has non-zero values
and claims support for v but a non-boot hart has zero values and
claims support for v but actually doesn't implement the ratified version.
If the boot hart doesn't support v, then we currently disable the
extension as only homogeneous stuff is supported by Linux. If the boot
hart claims support for "v" but doesn't actually implement the ratified
version neither the intent of my original patch nor this fix for it are
going to help avoid a broken kernel.

I think we do have a problem if the boot cpu having some erratum leads
to the kernel being patched in a way that does not work for the other
CPUs on the system, but I don't think this series addresses that sort of
issue at all as you'd be adding code to the pi section if you were fixing
it. I also don't think we should be making pre-emptive changes to the
errata patching code either to solve that sort of problem, until an SoC
shows up where things don't work.

Cheers,
Conor.
Charlie Jenkins April 12, 2024, 8:34 p.m. UTC | #8
On Fri, Apr 12, 2024 at 08:26:12PM +0100, Conor Dooley wrote:
> On Fri, Apr 12, 2024 at 11:46:21AM -0700, Charlie Jenkins wrote:
> > On Fri, Apr 12, 2024 at 07:38:04PM +0100, Conor Dooley wrote:
> > > On Fri, Apr 12, 2024 at 10:04:17AM -0700, Evan Green wrote:
> > > > On Fri, Apr 12, 2024 at 3:26 AM Conor Dooley <conor.dooley@microchip.com> wrote:
> > > > >
> > > > > On Thu, Apr 11, 2024 at 09:11:08PM -0700, Charlie Jenkins wrote:
> > > > > > The riscv_cpuinfo struct that contains mvendorid and marchid is not
> > > > > > populated until all harts are booted which happens after the DT parsing.
> > > > > > Use the vendorid/archid values from the DT if available or assume all
> > > > > > harts have the same values as the boot hart as a fallback.
> > > > > >
> > > > > > Fixes: d82f32202e0d ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs")
> > > > >
> > > > > If this is our only use case for getting the mvendorid/marchid stuff
> > > > > from dt, then I don't think we should add it. None of the devicetrees
> > > > > that the commit you're fixing here addresses will have these properties
> > > > > and if they did have them, they'd then also be new enough to hopefully
> > > > > not have "v" either - the issue is they're using whatever crap the
> > > > > vendor shipped.
> > > > > If we're gonna get the information from DT, we already have something
> > > > > that we can look at to perform the disable as the cpu compatibles give
> > > > > us enough information to make the decision.
> > > > >
> > > > > I also think that we could just cache the boot CPU's marchid/mvendorid,
> > > > > since we already have to look at it in riscv_fill_cpu_mfr_info(), avoid
> > > > > repeating these ecalls on all systems.
> > > > >
> > > > > Perhaps for now we could just look at the boot CPU alone? To my
> > > > > knowledge the systems that this targets all have homogeneous
> > > > > marchid/mvendorid values of 0x0.
> > > > 
> > > > It's possible I'm misinterpreting, but is the suggestion to apply the
> > > > marchid/mvendorid we find on the boot CPU and assume it's the same on
> > > > all other CPUs? Since we're reporting the marchid/mvendorid/mimpid to
> > > > usermode in a per-hart way, it would be better IMO if we really do
> > > > query marchid/mvendorid/mimpid on each hart. The problem with applying
> > > > the boot CPU's value everywhere is if we're ever wrong in the future
> > > > (ie that assumption doesn't hold on some machine), we'll only find out
> > > > about it after the fact. Since we reported the wrong information to
> > > > usermode via hwprobe, it'll be an ugly userspace ABI issue to clean
> > > > up.
> > > 
> > > You're misinterpreting, we do get the values on all individually as
> > > they're brought online. This is only used by the code that throws a bone
> > > to people with crappy vendor dtbs that put "v" in riscv,isa when they
> > > support the unratified version.
> > 
> > Not quite,
> 
> Remember that this patch stands in isolation and the justification given
> in your commit message does not mention anything other than fixing my
> broken patch.

Fixing the patch in the simplest sense would be to eagerly get the
mvendorid/marchid without using the cached version. But this assumes
that all harts have the same mvendorid/marchid. This is not something
that I am strongly attached to. If it truly is detrimental to Linux to
allow a user a way to specify different vendorids for different harts
then I will remove that code.

- Charlie

> 
> > the alternatives are patched before the other cpus are
> > booted, so the alternatives will have false positives resulting in
> > broken kernels.
> 
> Over-eagerly disabling vector isn't going to break any kernels and
> really should not break a behaving userspace either.
> Under-eagerly disabling it (in a way that this approach could solve) is
> only going to happen on a system where the boot hart has non-zero values
> and claims support for v but a non-boot hart has zero values and
> claims support for v but actually doesn't implement the ratified version.
> If the boot hart doesn't support v, then we currently disable the
> extension as only homogeneous stuff is supported by Linux. If the boot
> hart claims support for "v" but doesn't actually implement the ratified
> version neither the intent of my original patch nor this fix for it are
> going to help avoid a broken kernel.
> 
> I think we do have a problem if the boot cpu having some erratum leads
> to the kernel being patched in a way that does not work for the other
> CPUs on the system, but I don't think this series addresses that sort of
> issue at all as you'd be adding code to the pi section if you were fixing
> it. I also don't think we should be making pre-emptive changes to the
> errata patching code either to solve that sort of problem, until an SoC
> shows up where things don't work.
> Cheers,
> Conor.
Conor Dooley April 12, 2024, 8:42 p.m. UTC | #9
On Fri, Apr 12, 2024 at 01:34:43PM -0700, Charlie Jenkins wrote:
> On Fri, Apr 12, 2024 at 08:26:12PM +0100, Conor Dooley wrote:
> > On Fri, Apr 12, 2024 at 11:46:21AM -0700, Charlie Jenkins wrote:
> > > On Fri, Apr 12, 2024 at 07:38:04PM +0100, Conor Dooley wrote:
> > > > On Fri, Apr 12, 2024 at 10:04:17AM -0700, Evan Green wrote:
> > > > > On Fri, Apr 12, 2024 at 3:26 AM Conor Dooley <conor.dooley@microchip.com> wrote:
> > > > > >
> > > > > > On Thu, Apr 11, 2024 at 09:11:08PM -0700, Charlie Jenkins wrote:
> > > > > > > The riscv_cpuinfo struct that contains mvendorid and marchid is not
> > > > > > > populated until all harts are booted which happens after the DT parsing.
> > > > > > > Use the vendorid/archid values from the DT if available or assume all
> > > > > > > harts have the same values as the boot hart as a fallback.
> > > > > > >
> > > > > > > Fixes: d82f32202e0d ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs")
> > > > > >
> > > > > > If this is our only use case for getting the mvendorid/marchid stuff
> > > > > > from dt, then I don't think we should add it. None of the devicetrees
> > > > > > that the commit you're fixing here addresses will have these properties
> > > > > > and if they did have them, they'd then also be new enough to hopefully
> > > > > > not have "v" either - the issue is they're using whatever crap the
> > > > > > vendor shipped.
> > > > > > If we're gonna get the information from DT, we already have something
> > > > > > that we can look at to perform the disable as the cpu compatibles give
> > > > > > us enough information to make the decision.
> > > > > >
> > > > > > I also think that we could just cache the boot CPU's marchid/mvendorid,
> > > > > > since we already have to look at it in riscv_fill_cpu_mfr_info(), avoid
> > > > > > repeating these ecalls on all systems.
> > > > > >
> > > > > > Perhaps for now we could just look at the boot CPU alone? To my
> > > > > > knowledge the systems that this targets all have homogeneous
> > > > > > marchid/mvendorid values of 0x0.
> > > > > 
> > > > > It's possible I'm misinterpreting, but is the suggestion to apply the
> > > > > marchid/mvendorid we find on the boot CPU and assume it's the same on
> > > > > all other CPUs? Since we're reporting the marchid/mvendorid/mimpid to
> > > > > usermode in a per-hart way, it would be better IMO if we really do
> > > > > query marchid/mvendorid/mimpid on each hart. The problem with applying
> > > > > the boot CPU's value everywhere is if we're ever wrong in the future
> > > > > (ie that assumption doesn't hold on some machine), we'll only find out
> > > > > about it after the fact. Since we reported the wrong information to
> > > > > usermode via hwprobe, it'll be an ugly userspace ABI issue to clean
> > > > > up.
> > > > 
> > > > You're misinterpreting, we do get the values on all individually as
> > > > they're brought online. This is only used by the code that throws a bone
> > > > to people with crappy vendor dtbs that put "v" in riscv,isa when they
> > > > support the unratified version.
> > > 
> > > Not quite,
> > 
> > Remember that this patch stands in isolation and the justification given
> > in your commit message does not mention anything other than fixing my
> > broken patch.
> 
> Fixing the patch in the simplest sense would be to eagerly get the
> mvendorid/marchid without using the cached version. But this assumes
> that all harts have the same mvendorid/marchid. This is not something
> that I am strongly attached to. If it truly is detrimental to Linux to
> allow a user a way to specify different vendorids for different harts
> then I will remove that code.

I think that the simple fix is all that we need to do here, perhaps
updating the comment to point out how naive we are being.
`
> > 
> > > the alternatives are patched before the other cpus are
> > > booted, so the alternatives will have false positives resulting in
> > > broken kernels.
> > 
> > Over-eagerly disabling vector isn't going to break any kernels and
> > really should not break a behaving userspace either.
> > Under-eagerly disabling it (in a way that this approach could solve) is
> > only going to happen on a system where the boot hart has non-zero values
> > and claims support for v but a non-boot hart has zero values and
> > claims support for v but actually doesn't implement the ratified version.
> > If the boot hart doesn't support v, then we currently disable the
> > extension as only homogeneous stuff is supported by Linux. If the boot
> > hart claims support for "v" but doesn't actually implement the ratified
> > version neither the intent of my original patch nor this fix for it are
> > going to help avoid a broken kernel.
> > 
> > I think we do have a problem if the boot cpu having some erratum leads
> > to the kernel being patched in a way that does not work for the other
> > CPUs on the system, but I don't think this series addresses that sort of
> > issue at all as you'd be adding code to the pi section if you were fixing
> > it. I also don't think we should be making pre-emptive changes to the
> > errata patching code either to solve that sort of problem, until an SoC
> > shows up where things don't work.
> > Cheers,
> > Conor.
> 
>
Charlie Jenkins April 12, 2024, 8:48 p.m. UTC | #10
On Fri, Apr 12, 2024 at 07:47:48PM +0100, Conor Dooley wrote:
> On Fri, Apr 12, 2024 at 10:12:20AM -0700, Charlie Jenkins wrote:
> > On Fri, Apr 12, 2024 at 11:25:47AM +0100, Conor Dooley wrote:
> > > On Thu, Apr 11, 2024 at 09:11:08PM -0700, Charlie Jenkins wrote:
> > > > The riscv_cpuinfo struct that contains mvendorid and marchid is not
> > > > populated until all harts are booted which happens after the DT parsing.
> > > > Use the vendorid/archid values from the DT if available or assume all
> > > > harts have the same values as the boot hart as a fallback.
> > > > 
> > > > Fixes: d82f32202e0d ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs")
> > > 
> > > If this is our only use case for getting the mvendorid/marchid stuff
> > > from dt, then I don't think we should add it. None of the devicetrees
> > > that the commit you're fixing here addresses will have these properties
> > > and if they did have them, they'd then also be new enough to hopefully
> > > not have "v" either - the issue is they're using whatever crap the
> > > vendor shipped.
> > 
> > Yes, the DT those shipped with will not have the property in the DT so
> > will fall back on the boot hart. The addition of the DT properties allow
> > future heterogenous systems to be able to function.
> 
> I think you've kinda missed the point about what the original code was
> actually doing here. Really the kernel should not be doing validation of
> the devicetree at all, but I was trying to avoid people shooting
> themselves in the foot by doing something simple that would work for
> their (incorrect) vendor dtbs.
> Future heterogenous systems should be using riscv,isa-extensions, which
> is totally unaffected by this codepath (and setting actual values for
> mimpid/marchid too ideally!).
> 

I am on the same page with you about that. 

> > > If we're gonna get the information from DT, we already have something
> > > that we can look at to perform the disable as the cpu compatibles give
> > > us enough information to make the decision.
> > > 
> > > I also think that we could just cache the boot CPU's marchid/mvendorid,
> > > since we already have to look at it in riscv_fill_cpu_mfr_info(), avoid
> > > repeating these ecalls on all systems.
> > 
> > Yeah that is a minor optimization that can I can apply.
> > 
> > > 
> > > Perhaps for now we could just look at the boot CPU alone? To my
> > > knowledge the systems that this targets all have homogeneous
> > > marchid/mvendorid values of 0x0.
> > 
> > They have an mvendorid of 0x5b7.
> 
> That was a braino, clearly I should have typed "mimpid".
> 
> > This is already falling back on the boot CPU, but that is not a solution
> > that scales. Even though all systems currently have homogenous
> > marchid/mvendorid I am hesitant to assert that all systems are
> > homogenous without providing an option to override this.
> 
> There are already is an option. Use the non-deprecated property in your
> new system for describing what extesions you support. We don't need to
> add any more properties (for now at least).

The issue is that it is not possible to know which vendor extensions are
associated with a vendor. That requires a global namespace where each
extension can be looked up in a table. I have opted to have a
vendor-specific namespace so that vendors don't have to worry about
stepping on other vendor's toes (or the other way around). In order to
support that, the vendorid of the hart needs to be known prior.

I know a rebuttal here is that this is taking away from the point of
the original patch. I can split this patch up if so. The goal here is to
allow vendor extensions to play nicely with the rest of the system.
There are two uses of the mvendorid DT value, this fix, and the patch
that adds vendor extension support. I felt that it was applicable to
wrap the mvendorid DT value into this patch, but if you would prefer
that to live separate of this fix then that is fine too.

- Charlie

> 
> > The overhead is
> > looking for a field in the DT which does not seem to be impactful enough
> > to prevent the addition of this option.
> > 
> > > 
> > > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > 
> > > > @@ -514,12 +521,23 @@ static void __init riscv_fill_hwcap_from_isa_string(unsigned long *isa2hwcap)
> > > >  				pr_warn("Unable to find \"riscv,isa\" devicetree entry\n");
> > > >  				continue;
> > > >  			}
> > > > +			if (of_property_read_u64(node, "riscv,vendorid", &this_vendorid) < 0) {
> > > > +				pr_warn("Unable to find \"riscv,vendorid\" devicetree entry, using boot hart mvendorid instead\n");
> > > 
> > > This should 100% not be a warning, it's not a required property in the
> > > binding.
> > 
> > Yes definitely, thank you.
> > 
> > - Charlie
> > 
> > > 
> > > Cheers,
> > > Conor.
> > > 
> > > > +				this_vendorid = boot_vendorid;
> > > > +			}
> > > 
> > 
> >
Conor Dooley April 12, 2024, 9:27 p.m. UTC | #11
On Fri, Apr 12, 2024 at 01:48:46PM -0700, Charlie Jenkins wrote:
> On Fri, Apr 12, 2024 at 07:47:48PM +0100, Conor Dooley wrote:
> > On Fri, Apr 12, 2024 at 10:12:20AM -0700, Charlie Jenkins wrote:
> > > On Fri, Apr 12, 2024 at 11:25:47AM +0100, Conor Dooley wrote:
> > > > On Thu, Apr 11, 2024 at 09:11:08PM -0700, Charlie Jenkins wrote:
> > > > > The riscv_cpuinfo struct that contains mvendorid and marchid is not
> > > > > populated until all harts are booted which happens after the DT parsing.
> > > > > Use the vendorid/archid values from the DT if available or assume all
> > > > > harts have the same values as the boot hart as a fallback.
> > > > > 
> > > > > Fixes: d82f32202e0d ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs")
> > > > 
> > > > If this is our only use case for getting the mvendorid/marchid stuff
> > > > from dt, then I don't think we should add it. None of the devicetrees
> > > > that the commit you're fixing here addresses will have these properties
> > > > and if they did have them, they'd then also be new enough to hopefully
> > > > not have "v" either - the issue is they're using whatever crap the
> > > > vendor shipped.
> > > 
> > > Yes, the DT those shipped with will not have the property in the DT so
> > > will fall back on the boot hart. The addition of the DT properties allow
> > > future heterogenous systems to be able to function.
> > 
> > I think you've kinda missed the point about what the original code was
> > actually doing here. Really the kernel should not be doing validation of
> > the devicetree at all, but I was trying to avoid people shooting
> > themselves in the foot by doing something simple that would work for
> > their (incorrect) vendor dtbs.
> > Future heterogenous systems should be using riscv,isa-extensions, which
> > is totally unaffected by this codepath (and setting actual values for
> > mimpid/marchid too ideally!).
> > 
> 
> I am on the same page with you about that. 
> 
> > > > If we're gonna get the information from DT, we already have something
> > > > that we can look at to perform the disable as the cpu compatibles give
> > > > us enough information to make the decision.
> > > > 
> > > > I also think that we could just cache the boot CPU's marchid/mvendorid,
> > > > since we already have to look at it in riscv_fill_cpu_mfr_info(), avoid
> > > > repeating these ecalls on all systems.
> > > 
> > > Yeah that is a minor optimization that can I can apply.
> > > 
> > > > 
> > > > Perhaps for now we could just look at the boot CPU alone? To my
> > > > knowledge the systems that this targets all have homogeneous
> > > > marchid/mvendorid values of 0x0.
> > > 
> > > They have an mvendorid of 0x5b7.
> > 
> > That was a braino, clearly I should have typed "mimpid".
> > 
> > > This is already falling back on the boot CPU, but that is not a solution
> > > that scales. Even though all systems currently have homogenous
> > > marchid/mvendorid I am hesitant to assert that all systems are
> > > homogenous without providing an option to override this.
> > 
> > There are already is an option. Use the non-deprecated property in your
> > new system for describing what extesions you support. We don't need to
> > add any more properties (for now at least).
> 
> The issue is that it is not possible to know which vendor extensions are
> associated with a vendor. That requires a global namespace where each
> extension can be looked up in a table. I have opted to have a
> vendor-specific namespace so that vendors don't have to worry about
> stepping on other vendor's toes (or the other way around). In order to
> support that, the vendorid of the hart needs to be known prior.

Nah, I think you're mixing up something like hwprobe and having
namespaces there with needing namespacing on the devicetree probing side
too. You don't need any vendor namespacing, it's perfectly fine (IMO)
for a vendor to implement someone else's extension and I think we should
allow probing any vendors extension on any CPU.

> I know a rebuttal here is that this is taking away from the point of
> the original patch. I can split this patch up if so. The goal here is to
> allow vendor extensions to play nicely with the rest of the system.
> There are two uses of the mvendorid DT value, this fix, and the patch
> that adds vendor extension support. I felt that it was applicable to
> wrap the mvendorid DT value into this patch, but if you would prefer
> that to live separate of this fix then that is fine too.
> 
> - Charlie
> 
> > 
> > > The overhead is
> > > looking for a field in the DT which does not seem to be impactful enough
> > > to prevent the addition of this option.
> > > 
> > > > 
> > > > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > > 
> > > > > @@ -514,12 +521,23 @@ static void __init riscv_fill_hwcap_from_isa_string(unsigned long *isa2hwcap)
> > > > >  				pr_warn("Unable to find \"riscv,isa\" devicetree entry\n");
> > > > >  				continue;
> > > > >  			}
> > > > > +			if (of_property_read_u64(node, "riscv,vendorid", &this_vendorid) < 0) {
> > > > > +				pr_warn("Unable to find \"riscv,vendorid\" devicetree entry, using boot hart mvendorid instead\n");
> > > > 
> > > > This should 100% not be a warning, it's not a required property in the
> > > > binding.
> > > 
> > > Yes definitely, thank you.
> > > 
> > > - Charlie
> > > 
> > > > 
> > > > Cheers,
> > > > Conor.
> > > > 
> > > > > +				this_vendorid = boot_vendorid;
> > > > > +			}
> > > > 
> > > 
> > > 
> 
>
Charlie Jenkins April 12, 2024, 9:31 p.m. UTC | #12
On Fri, Apr 12, 2024 at 10:27:47PM +0100, Conor Dooley wrote:
> On Fri, Apr 12, 2024 at 01:48:46PM -0700, Charlie Jenkins wrote:
> > On Fri, Apr 12, 2024 at 07:47:48PM +0100, Conor Dooley wrote:
> > > On Fri, Apr 12, 2024 at 10:12:20AM -0700, Charlie Jenkins wrote:
> > > > On Fri, Apr 12, 2024 at 11:25:47AM +0100, Conor Dooley wrote:
> > > > > On Thu, Apr 11, 2024 at 09:11:08PM -0700, Charlie Jenkins wrote:
> > > > > > The riscv_cpuinfo struct that contains mvendorid and marchid is not
> > > > > > populated until all harts are booted which happens after the DT parsing.
> > > > > > Use the vendorid/archid values from the DT if available or assume all
> > > > > > harts have the same values as the boot hart as a fallback.
> > > > > > 
> > > > > > Fixes: d82f32202e0d ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs")
> > > > > 
> > > > > If this is our only use case for getting the mvendorid/marchid stuff
> > > > > from dt, then I don't think we should add it. None of the devicetrees
> > > > > that the commit you're fixing here addresses will have these properties
> > > > > and if they did have them, they'd then also be new enough to hopefully
> > > > > not have "v" either - the issue is they're using whatever crap the
> > > > > vendor shipped.
> > > > 
> > > > Yes, the DT those shipped with will not have the property in the DT so
> > > > will fall back on the boot hart. The addition of the DT properties allow
> > > > future heterogenous systems to be able to function.
> > > 
> > > I think you've kinda missed the point about what the original code was
> > > actually doing here. Really the kernel should not be doing validation of
> > > the devicetree at all, but I was trying to avoid people shooting
> > > themselves in the foot by doing something simple that would work for
> > > their (incorrect) vendor dtbs.
> > > Future heterogenous systems should be using riscv,isa-extensions, which
> > > is totally unaffected by this codepath (and setting actual values for
> > > mimpid/marchid too ideally!).
> > > 
> > 
> > I am on the same page with you about that. 
> > 
> > > > > If we're gonna get the information from DT, we already have something
> > > > > that we can look at to perform the disable as the cpu compatibles give
> > > > > us enough information to make the decision.
> > > > > 
> > > > > I also think that we could just cache the boot CPU's marchid/mvendorid,
> > > > > since we already have to look at it in riscv_fill_cpu_mfr_info(), avoid
> > > > > repeating these ecalls on all systems.
> > > > 
> > > > Yeah that is a minor optimization that can I can apply.
> > > > 
> > > > > 
> > > > > Perhaps for now we could just look at the boot CPU alone? To my
> > > > > knowledge the systems that this targets all have homogeneous
> > > > > marchid/mvendorid values of 0x0.
> > > > 
> > > > They have an mvendorid of 0x5b7.
> > > 
> > > That was a braino, clearly I should have typed "mimpid".
> > > 
> > > > This is already falling back on the boot CPU, but that is not a solution
> > > > that scales. Even though all systems currently have homogenous
> > > > marchid/mvendorid I am hesitant to assert that all systems are
> > > > homogenous without providing an option to override this.
> > > 
> > > There are already is an option. Use the non-deprecated property in your
> > > new system for describing what extesions you support. We don't need to
> > > add any more properties (for now at least).
> > 
> > The issue is that it is not possible to know which vendor extensions are
> > associated with a vendor. That requires a global namespace where each
> > extension can be looked up in a table. I have opted to have a
> > vendor-specific namespace so that vendors don't have to worry about
> > stepping on other vendor's toes (or the other way around). In order to
> > support that, the vendorid of the hart needs to be known prior.
> 
> Nah, I think you're mixing up something like hwprobe and having
> namespaces there with needing namespacing on the devicetree probing side
> too. You don't need any vendor namespacing, it's perfectly fine (IMO)
> for a vendor to implement someone else's extension and I think we should
> allow probing any vendors extension on any CPU.

I am not mixing it up. Sure a vendor can implement somebody else's
extension, they just need to add it to their namespace too.

- Charlie

> 
> > I know a rebuttal here is that this is taking away from the point of
> > the original patch. I can split this patch up if so. The goal here is to
> > allow vendor extensions to play nicely with the rest of the system.
> > There are two uses of the mvendorid DT value, this fix, and the patch
> > that adds vendor extension support. I felt that it was applicable to
> > wrap the mvendorid DT value into this patch, but if you would prefer
> > that to live separate of this fix then that is fine too.
> > 
> > - Charlie
> > 
> > > 
> > > > The overhead is
> > > > looking for a field in the DT which does not seem to be impactful enough
> > > > to prevent the addition of this option.
> > > > 
> > > > > 
> > > > > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > > > 
> > > > > > @@ -514,12 +521,23 @@ static void __init riscv_fill_hwcap_from_isa_string(unsigned long *isa2hwcap)
> > > > > >  				pr_warn("Unable to find \"riscv,isa\" devicetree entry\n");
> > > > > >  				continue;
> > > > > >  			}
> > > > > > +			if (of_property_read_u64(node, "riscv,vendorid", &this_vendorid) < 0) {
> > > > > > +				pr_warn("Unable to find \"riscv,vendorid\" devicetree entry, using boot hart mvendorid instead\n");
> > > > > 
> > > > > This should 100% not be a warning, it's not a required property in the
> > > > > binding.
> > > > 
> > > > Yes definitely, thank you.
> > > > 
> > > > - Charlie
> > > > 
> > > > > 
> > > > > Cheers,
> > > > > Conor.
> > > > > 
> > > > > > +				this_vendorid = boot_vendorid;
> > > > > > +			}
> > > > > 
> > > > 
> > > > 
> > 
> >
Conor Dooley April 12, 2024, 11:40 p.m. UTC | #13
On Fri, Apr 12, 2024 at 02:31:42PM -0700, Charlie Jenkins wrote:
> On Fri, Apr 12, 2024 at 10:27:47PM +0100, Conor Dooley wrote:
> > On Fri, Apr 12, 2024 at 01:48:46PM -0700, Charlie Jenkins wrote:
> > > On Fri, Apr 12, 2024 at 07:47:48PM +0100, Conor Dooley wrote:
> > > > On Fri, Apr 12, 2024 at 10:12:20AM -0700, Charlie Jenkins wrote:

> > > > > This is already falling back on the boot CPU, but that is not a solution
> > > > > that scales. Even though all systems currently have homogenous
> > > > > marchid/mvendorid I am hesitant to assert that all systems are
> > > > > homogenous without providing an option to override this.
> > > > 
> > > > There are already is an option. Use the non-deprecated property in your
> > > > new system for describing what extesions you support. We don't need to
> > > > add any more properties (for now at least).
> > > 
> > > The issue is that it is not possible to know which vendor extensions are
> > > associated with a vendor. That requires a global namespace where each
> > > extension can be looked up in a table. I have opted to have a
> > > vendor-specific namespace so that vendors don't have to worry about
> > > stepping on other vendor's toes (or the other way around). In order to
> > > support that, the vendorid of the hart needs to be known prior.
> > 
> > Nah, I think you're mixing up something like hwprobe and having
> > namespaces there with needing namespacing on the devicetree probing side
> > too. You don't need any vendor namespacing, it's perfectly fine (IMO)
> > for a vendor to implement someone else's extension and I think we should
> > allow probing any vendors extension on any CPU.
> 
> I am not mixing it up. Sure a vendor can implement somebody else's
> extension, they just need to add it to their namespace too.

I didn't mean that you were mixing up how your implementation worked, my
point was that you're mixing up the hwprobe stuff which may need
namespacing for $a{b,p}i_reason and probing from DT which does not.
I don't think that the kernel should need to be changed at all if
someone shows up and implements another vendor's extension - we already
have far too many kernel changes required to display support for
extensions and I don't welcome potential for more.

Another thing I just thought of was systems where the SoC vendor
implements some extension that gets communicated in the ISA string but
is not the vendor in mvendorid in their various CPUs. I wouldn't want to
see several different entries in structs (or several different hwprobe
keys, but that's another story) for this situation because you're only
allowing probing what's in the struct matching the vendorid.
Charlie Jenkins April 16, 2024, 3:34 a.m. UTC | #14
On Sat, Apr 13, 2024 at 12:40:26AM +0100, Conor Dooley wrote:
> On Fri, Apr 12, 2024 at 02:31:42PM -0700, Charlie Jenkins wrote:
> > On Fri, Apr 12, 2024 at 10:27:47PM +0100, Conor Dooley wrote:
> > > On Fri, Apr 12, 2024 at 01:48:46PM -0700, Charlie Jenkins wrote:
> > > > On Fri, Apr 12, 2024 at 07:47:48PM +0100, Conor Dooley wrote:
> > > > > On Fri, Apr 12, 2024 at 10:12:20AM -0700, Charlie Jenkins wrote:
> 
> > > > > > This is already falling back on the boot CPU, but that is not a solution
> > > > > > that scales. Even though all systems currently have homogenous
> > > > > > marchid/mvendorid I am hesitant to assert that all systems are
> > > > > > homogenous without providing an option to override this.
> > > > > 
> > > > > There are already is an option. Use the non-deprecated property in your
> > > > > new system for describing what extesions you support. We don't need to
> > > > > add any more properties (for now at least).
> > > > 
> > > > The issue is that it is not possible to know which vendor extensions are
> > > > associated with a vendor. That requires a global namespace where each
> > > > extension can be looked up in a table. I have opted to have a
> > > > vendor-specific namespace so that vendors don't have to worry about
> > > > stepping on other vendor's toes (or the other way around). In order to
> > > > support that, the vendorid of the hart needs to be known prior.
> > > 
> > > Nah, I think you're mixing up something like hwprobe and having
> > > namespaces there with needing namespacing on the devicetree probing side
> > > too. You don't need any vendor namespacing, it's perfectly fine (IMO)
> > > for a vendor to implement someone else's extension and I think we should
> > > allow probing any vendors extension on any CPU.
> > 
> > I am not mixing it up. Sure a vendor can implement somebody else's
> > extension, they just need to add it to their namespace too.
> 
> I didn't mean that you were mixing up how your implementation worked, my
> point was that you're mixing up the hwprobe stuff which may need
> namespacing for $a{b,p}i_reason and probing from DT which does not.
> I don't think that the kernel should need to be changed at all if
> someone shows up and implements another vendor's extension - we already
> have far too many kernel changes required to display support for
> extensions and I don't welcome potential for more.

Yes I understand where you are coming from. We do not want it to require
very many changes to add an extension. With this framework, there are
the same number of changes to add a vendor extension as there is to add
a standard extension. There is the upfront cost of creating the struct
for the first vendor extension from a vendor, but after that the
extension only needs to be added to the associated vendor's file (I am
extracting this out to a vendor file in the next version). This is also
a very easy task since the fields from a different vendor can be copied
and adapted.

> 
> Another thing I just thought of was systems where the SoC vendor
> implements some extension that gets communicated in the ISA string but
> is not the vendor in mvendorid in their various CPUs. I wouldn't want to
> see several different entries in structs (or several different hwprobe
> keys, but that's another story) for this situation because you're only
> allowing probing what's in the struct matching the vendorid.

Since the isa string is a per-hart field, the vendor associated with the
hart will be used.

- Charlie
Conor Dooley April 16, 2024, 7:36 a.m. UTC | #15
On Mon, Apr 15, 2024 at 08:34:05PM -0700, Charlie Jenkins wrote:
> On Sat, Apr 13, 2024 at 12:40:26AM +0100, Conor Dooley wrote:
> > On Fri, Apr 12, 2024 at 02:31:42PM -0700, Charlie Jenkins wrote:
> > > On Fri, Apr 12, 2024 at 10:27:47PM +0100, Conor Dooley wrote:
> > > > On Fri, Apr 12, 2024 at 01:48:46PM -0700, Charlie Jenkins wrote:
> > > > > On Fri, Apr 12, 2024 at 07:47:48PM +0100, Conor Dooley wrote:
> > > > > > On Fri, Apr 12, 2024 at 10:12:20AM -0700, Charlie Jenkins wrote:
> > 
> > > > > > > This is already falling back on the boot CPU, but that is not a solution
> > > > > > > that scales. Even though all systems currently have homogenous
> > > > > > > marchid/mvendorid I am hesitant to assert that all systems are
> > > > > > > homogenous without providing an option to override this.
> > > > > > 
> > > > > > There are already is an option. Use the non-deprecated property in your
> > > > > > new system for describing what extesions you support. We don't need to
> > > > > > add any more properties (for now at least).
> > > > > 
> > > > > The issue is that it is not possible to know which vendor extensions are
> > > > > associated with a vendor. That requires a global namespace where each
> > > > > extension can be looked up in a table. I have opted to have a
> > > > > vendor-specific namespace so that vendors don't have to worry about
> > > > > stepping on other vendor's toes (or the other way around). In order to
> > > > > support that, the vendorid of the hart needs to be known prior.
> > > > 
> > > > Nah, I think you're mixing up something like hwprobe and having
> > > > namespaces there with needing namespacing on the devicetree probing side
> > > > too. You don't need any vendor namespacing, it's perfectly fine (IMO)
> > > > for a vendor to implement someone else's extension and I think we should
> > > > allow probing any vendors extension on any CPU.
> > > 
> > > I am not mixing it up. Sure a vendor can implement somebody else's
> > > extension, they just need to add it to their namespace too.
> > 
> > I didn't mean that you were mixing up how your implementation worked, my
> > point was that you're mixing up the hwprobe stuff which may need
> > namespacing for $a{b,p}i_reason and probing from DT which does not.
> > I don't think that the kernel should need to be changed at all if
> > someone shows up and implements another vendor's extension - we already
> > have far too many kernel changes required to display support for
> > extensions and I don't welcome potential for more.
> 
> Yes I understand where you are coming from. We do not want it to require
> very many changes to add an extension. With this framework, there are
> the same number of changes to add a vendor extension as there is to add
> a standard extension. 

No, it is actually subtly different. Even if the kernel already supports
the extension, it needs to be patched for each vendor

> There is the upfront cost of creating the struct
> for the first vendor extension from a vendor, but after that the
> extension only needs to be added to the associated vendor's file (I am
> extracting this out to a vendor file in the next version). This is also
> a very easy task since the fields from a different vendor can be copied
> and adapted.
> 
> > Another thing I just thought of was systems where the SoC vendor
> > implements some extension that gets communicated in the ISA string but
> > is not the vendor in mvendorid in their various CPUs. I wouldn't want to
> > see several different entries in structs (or several different hwprobe
> > keys, but that's another story) for this situation because you're only
> > allowing probing what's in the struct matching the vendorid.
> 
> Since the isa string is a per-hart field, the vendor associated with the
> hart will be used.

I don't know if you just didn't really read what I said or didn't
understand it, but this response doesn't address my comment.
Consider SoC vendor S buys CPUs from vendors A & B and asks both of them
to implement Xsjam. The CPUs are have the vendorid of either A or B,
depending on who made it. This scenario should not result in two
different hwprobe keys nor two different in-kernel riscv_has_vendor_ext()
checks to see if the extension is supported. *If* the extension is vendor
namespaced, it should be to the SoC vendor whose extension it is, not
the individual CPU vendors that implemented it.

Additionally, consider that CPUs from both vendors are in the same SoC
and all CPUs support Xsjam. Linux only supports homogeneous extensions
so we should be able to detect that all CPUs support the extension and
use it in a driver etc, but that's either not going to work (or be
difficult to orchestrate) with different mappings per CPU vendor. I saw
your v2 cover letter, in which you said:
  Only patch vendor extension if all harts are associated with the same
  vendor. This is the best chance the kernel has for working properly if
  there are multiple vendors.
I don't think that level of paranoia is required: if firmware tells us
that an extension is supported, then we can trust that those extensions
have been implemented correctly. If the fear of implementation bugs is
what is driving the namespacing that you've gone for, I don't think that
it is required and we can simplify things, with the per-vendor structs
being the vendor of the extension (so SoC vendor S in my example), not
A and B who are the vendors of the CPU IP.

Thanks,
Conor.
Charlie Jenkins April 17, 2024, 4:25 a.m. UTC | #16
On Tue, Apr 16, 2024 at 08:36:33AM +0100, Conor Dooley wrote:
> On Mon, Apr 15, 2024 at 08:34:05PM -0700, Charlie Jenkins wrote:
> > On Sat, Apr 13, 2024 at 12:40:26AM +0100, Conor Dooley wrote:
> > > On Fri, Apr 12, 2024 at 02:31:42PM -0700, Charlie Jenkins wrote:
> > > > On Fri, Apr 12, 2024 at 10:27:47PM +0100, Conor Dooley wrote:
> > > > > On Fri, Apr 12, 2024 at 01:48:46PM -0700, Charlie Jenkins wrote:
> > > > > > On Fri, Apr 12, 2024 at 07:47:48PM +0100, Conor Dooley wrote:
> > > > > > > On Fri, Apr 12, 2024 at 10:12:20AM -0700, Charlie Jenkins wrote:
> > > 
> > > > > > > > This is already falling back on the boot CPU, but that is not a solution
> > > > > > > > that scales. Even though all systems currently have homogenous
> > > > > > > > marchid/mvendorid I am hesitant to assert that all systems are
> > > > > > > > homogenous without providing an option to override this.
> > > > > > > 
> > > > > > > There are already is an option. Use the non-deprecated property in your
> > > > > > > new system for describing what extesions you support. We don't need to
> > > > > > > add any more properties (for now at least).
> > > > > > 
> > > > > > The issue is that it is not possible to know which vendor extensions are
> > > > > > associated with a vendor. That requires a global namespace where each
> > > > > > extension can be looked up in a table. I have opted to have a
> > > > > > vendor-specific namespace so that vendors don't have to worry about
> > > > > > stepping on other vendor's toes (or the other way around). In order to
> > > > > > support that, the vendorid of the hart needs to be known prior.
> > > > > 
> > > > > Nah, I think you're mixing up something like hwprobe and having
> > > > > namespaces there with needing namespacing on the devicetree probing side
> > > > > too. You don't need any vendor namespacing, it's perfectly fine (IMO)
> > > > > for a vendor to implement someone else's extension and I think we should
> > > > > allow probing any vendors extension on any CPU.
> > > > 
> > > > I am not mixing it up. Sure a vendor can implement somebody else's
> > > > extension, they just need to add it to their namespace too.
> > > 
> > > I didn't mean that you were mixing up how your implementation worked, my
> > > point was that you're mixing up the hwprobe stuff which may need
> > > namespacing for $a{b,p}i_reason and probing from DT which does not.
> > > I don't think that the kernel should need to be changed at all if
> > > someone shows up and implements another vendor's extension - we already
> > > have far too many kernel changes required to display support for
> > > extensions and I don't welcome potential for more.
> > 
> > Yes I understand where you are coming from. We do not want it to require
> > very many changes to add an extension. With this framework, there are
> > the same number of changes to add a vendor extension as there is to add
> > a standard extension. 
> 
> No, it is actually subtly different. Even if the kernel already supports
> the extension, it needs to be patched for each vendor
> 
> > There is the upfront cost of creating the struct
> > for the first vendor extension from a vendor, but after that the
> > extension only needs to be added to the associated vendor's file (I am
> > extracting this out to a vendor file in the next version). This is also
> > a very easy task since the fields from a different vendor can be copied
> > and adapted.
> > 
> > > Another thing I just thought of was systems where the SoC vendor
> > > implements some extension that gets communicated in the ISA string but
> > > is not the vendor in mvendorid in their various CPUs. I wouldn't want to
> > > see several different entries in structs (or several different hwprobe
> > > keys, but that's another story) for this situation because you're only
> > > allowing probing what's in the struct matching the vendorid.
> > 
> > Since the isa string is a per-hart field, the vendor associated with the
> > hart will be used.
> 
> I don't know if you just didn't really read what I said or didn't
> understand it, but this response doesn't address my comment.

I read what you said! This question seemed to me as another variant of
"what happens when one vendor implements an extension from a different
vendor", and since we already discussed that I was trying to figure out
what you were actually asking.

> Consider SoC vendor S buys CPUs from vendors A & B and asks both of them
> to implement Xsjam. The CPUs are have the vendorid of either A or B,
> depending on who made it. This scenario should not result in two
> different hwprobe keys nor two different in-kernel riscv_has_vendor_ext()
> checks to see if the extension is supported. *If* the extension is vendor
> namespaced, it should be to the SoC vendor whose extension it is, not
> the individual CPU vendors that implemented it.
> 
> Additionally, consider that CPUs from both vendors are in the same SoC
> and all CPUs support Xsjam. Linux only supports homogeneous extensions
> so we should be able to detect that all CPUs support the extension and
> use it in a driver etc, but that's either not going to work (or be
> difficult to orchestrate) with different mappings per CPU vendor. I saw
> your v2 cover letter, in which you said:
>   Only patch vendor extension if all harts are associated with the same
>   vendor. This is the best chance the kernel has for working properly if
>   there are multiple vendors.
> I don't think that level of paranoia is required: if firmware tells us
> that an extension is supported, then we can trust that those extensions
> have been implemented correctly. If the fear of implementation bugs is
> what is driving the namespacing that you've gone for, I don't think that
> it is required and we can simplify things, with the per-vendor structs
> being the vendor of the extension (so SoC vendor S in my example), not
> A and B who are the vendors of the CPU IP.
> 
> Thanks,
> Conor.
> 

Thank you for expanding upon this idea further. This solution of
indexing the extensions based on the vendor who proposed them does make
a lot of sense. There are some key differences here of note. When
vendors are able to mix vendor extensions, defining a bitmask that
contains all of the vendor extensions gets a bit messier. I see two
possible solutions.

1. Vendor keys cannot overlap between vendors. A set bit in the bitmask
is associated with exactly one extension.

2. Vendor keys can overlap between vendors. There is a vendor bitmask
per vendor. When setting/checking a vendor extension, first index into
the vendor extension bitmask with the vendor associated with the
extension and then with the key of the vendor extension.

A third option would be to use the standard extension framework. This
causes the standard extension list to become populated with extensions
that most harts will never implement so I am opposed to that.

This problem carries over into hwprobe since the schemes proposed by
Evan and I both rely on the mvendorid of harts associated with the
cpumask. To have this level of support in hwprobe for SoCs with a mix of
vendors but the same extensions I again see two options:

1. Vendor keys cannot overlap between vendors. A set bit in the bitmask
is associated with exactly one extension. This bitmask would be returned
by the vendor extension hwprobe key.

2. Vendor keys can overlap between vendors. There is an hwprobe key per
vendor. Automatic resolution of the vendor doesn't work because the
vendor-specific feature being requested (extensions in the case) may be
of a vendor that is different than the hart's vendor, in otherwords
there are two variables necessary: the vendor and a way to ask hwprobe
for a list of the vendor extensions. With hwprobe there is only the
"key" that can be used to encode these variables simultaneously. We
could have something like a HWPROBE_THEAD_EXT_0 key that would return
all thead vendor extensions supported by the harts corresponding to the
cpumask.

I didn't list the option that we shove all of the vendor extensions into
the same fields that are used for standard extensions because that will
fill up the standard extension probing with all of the vendor extensions
that most SoCs will not care about.

The second option for hwprobe is nice because there are "only" 64 values
supported in the returned bitmask so if there ends up being a lot of
vendor extensions that need to be exposed, then we would end up with a
lot of unused bits on most systems.

For the internal kernel structures it matters less (or doesn't matter at
all) since it's not exposed to userspace and it can always change.
Having consistency is nice for developers though so it would be my
preference to have schemes that reflect each other for the in-kernel
structures and hwprobe.

Thank you for working this problem out with me. I know there is a lot of
text I am pushing here, hopefully we can design something that doesn't
need to be re-written in the future.

- Charlie
Evan Green April 17, 2024, 4:02 p.m. UTC | #17
On Tue, Apr 16, 2024 at 9:25 PM Charlie Jenkins <charlie@rivosinc.com> wrote:
>
> On Tue, Apr 16, 2024 at 08:36:33AM +0100, Conor Dooley wrote:
> > On Mon, Apr 15, 2024 at 08:34:05PM -0700, Charlie Jenkins wrote:
> > > On Sat, Apr 13, 2024 at 12:40:26AM +0100, Conor Dooley wrote:
> > > > On Fri, Apr 12, 2024 at 02:31:42PM -0700, Charlie Jenkins wrote:
> > > > > On Fri, Apr 12, 2024 at 10:27:47PM +0100, Conor Dooley wrote:
> > > > > > On Fri, Apr 12, 2024 at 01:48:46PM -0700, Charlie Jenkins wrote:
> > > > > > > On Fri, Apr 12, 2024 at 07:47:48PM +0100, Conor Dooley wrote:
> > > > > > > > On Fri, Apr 12, 2024 at 10:12:20AM -0700, Charlie Jenkins wrote:
> > > >
> > > > > > > > > This is already falling back on the boot CPU, but that is not a solution
> > > > > > > > > that scales. Even though all systems currently have homogenous
> > > > > > > > > marchid/mvendorid I am hesitant to assert that all systems are
> > > > > > > > > homogenous without providing an option to override this.
> > > > > > > >
> > > > > > > > There are already is an option. Use the non-deprecated property in your
> > > > > > > > new system for describing what extesions you support. We don't need to
> > > > > > > > add any more properties (for now at least).
> > > > > > >
> > > > > > > The issue is that it is not possible to know which vendor extensions are
> > > > > > > associated with a vendor. That requires a global namespace where each
> > > > > > > extension can be looked up in a table. I have opted to have a
> > > > > > > vendor-specific namespace so that vendors don't have to worry about
> > > > > > > stepping on other vendor's toes (or the other way around). In order to
> > > > > > > support that, the vendorid of the hart needs to be known prior.
> > > > > >
> > > > > > Nah, I think you're mixing up something like hwprobe and having
> > > > > > namespaces there with needing namespacing on the devicetree probing side
> > > > > > too. You don't need any vendor namespacing, it's perfectly fine (IMO)
> > > > > > for a vendor to implement someone else's extension and I think we should
> > > > > > allow probing any vendors extension on any CPU.
> > > > >
> > > > > I am not mixing it up. Sure a vendor can implement somebody else's
> > > > > extension, they just need to add it to their namespace too.
> > > >
> > > > I didn't mean that you were mixing up how your implementation worked, my
> > > > point was that you're mixing up the hwprobe stuff which may need
> > > > namespacing for $a{b,p}i_reason and probing from DT which does not.
> > > > I don't think that the kernel should need to be changed at all if
> > > > someone shows up and implements another vendor's extension - we already
> > > > have far too many kernel changes required to display support for
> > > > extensions and I don't welcome potential for more.
> > >
> > > Yes I understand where you are coming from. We do not want it to require
> > > very many changes to add an extension. With this framework, there are
> > > the same number of changes to add a vendor extension as there is to add
> > > a standard extension.
> >
> > No, it is actually subtly different. Even if the kernel already supports
> > the extension, it needs to be patched for each vendor
> >
> > > There is the upfront cost of creating the struct
> > > for the first vendor extension from a vendor, but after that the
> > > extension only needs to be added to the associated vendor's file (I am
> > > extracting this out to a vendor file in the next version). This is also
> > > a very easy task since the fields from a different vendor can be copied
> > > and adapted.
> > >
> > > > Another thing I just thought of was systems where the SoC vendor
> > > > implements some extension that gets communicated in the ISA string but
> > > > is not the vendor in mvendorid in their various CPUs. I wouldn't want to
> > > > see several different entries in structs (or several different hwprobe
> > > > keys, but that's another story) for this situation because you're only
> > > > allowing probing what's in the struct matching the vendorid.
> > >
> > > Since the isa string is a per-hart field, the vendor associated with the
> > > hart will be used.
> >
> > I don't know if you just didn't really read what I said or didn't
> > understand it, but this response doesn't address my comment.
>
> I read what you said! This question seemed to me as another variant of
> "what happens when one vendor implements an extension from a different
> vendor", and since we already discussed that I was trying to figure out
> what you were actually asking.
>
> > Consider SoC vendor S buys CPUs from vendors A & B and asks both of them
> > to implement Xsjam. The CPUs are have the vendorid of either A or B,
> > depending on who made it. This scenario should not result in two
> > different hwprobe keys nor two different in-kernel riscv_has_vendor_ext()
> > checks to see if the extension is supported. *If* the extension is vendor
> > namespaced, it should be to the SoC vendor whose extension it is, not
> > the individual CPU vendors that implemented it.
> >
> > Additionally, consider that CPUs from both vendors are in the same SoC
> > and all CPUs support Xsjam. Linux only supports homogeneous extensions
> > so we should be able to detect that all CPUs support the extension and
> > use it in a driver etc, but that's either not going to work (or be
> > difficult to orchestrate) with different mappings per CPU vendor. I saw
> > your v2 cover letter, in which you said:
> >   Only patch vendor extension if all harts are associated with the same
> >   vendor. This is the best chance the kernel has for working properly if
> >   there are multiple vendors.
> > I don't think that level of paranoia is required: if firmware tells us
> > that an extension is supported, then we can trust that those extensions
> > have been implemented correctly. If the fear of implementation bugs is
> > what is driving the namespacing that you've gone for, I don't think that
> > it is required and we can simplify things, with the per-vendor structs
> > being the vendor of the extension (so SoC vendor S in my example), not
> > A and B who are the vendors of the CPU IP.
> >
> > Thanks,
> > Conor.
> >
>
> Thank you for expanding upon this idea further. This solution of
> indexing the extensions based on the vendor who proposed them does make
> a lot of sense. There are some key differences here of note. When
> vendors are able to mix vendor extensions, defining a bitmask that
> contains all of the vendor extensions gets a bit messier. I see two
> possible solutions.
>
> 1. Vendor keys cannot overlap between vendors. A set bit in the bitmask
> is associated with exactly one extension.
>
> 2. Vendor keys can overlap between vendors. There is a vendor bitmask
> per vendor. When setting/checking a vendor extension, first index into
> the vendor extension bitmask with the vendor associated with the
> extension and then with the key of the vendor extension.
>
> A third option would be to use the standard extension framework. This
> causes the standard extension list to become populated with extensions
> that most harts will never implement so I am opposed to that.
>
> This problem carries over into hwprobe since the schemes proposed by
> Evan and I both rely on the mvendorid of harts associated with the
> cpumask. To have this level of support in hwprobe for SoCs with a mix of
> vendors but the same extensions I again see two options:
>
> 1. Vendor keys cannot overlap between vendors. A set bit in the bitmask
> is associated with exactly one extension. This bitmask would be returned
> by the vendor extension hwprobe key.
>
> 2. Vendor keys can overlap between vendors. There is an hwprobe key per
> vendor. Automatic resolution of the vendor doesn't work because the
> vendor-specific feature being requested (extensions in the case) may be
> of a vendor that is different than the hart's vendor, in otherwords
> there are two variables necessary: the vendor and a way to ask hwprobe
> for a list of the vendor extensions. With hwprobe there is only the
> "key" that can be used to encode these variables simultaneously. We
> could have something like a HWPROBE_THEAD_EXT_0 key that would return
> all thead vendor extensions supported by the harts corresponding to the
> cpumask.

I was a big proponent of the vendor namespacing in hwprobe, as I liked
the tidiness of it, and felt it could handle most cases (including
mix-n-matching multiple mvendorids in a single SoC). However my
balloon lost its air after chatting with Palmer, as there's one case
it really can't handle: white labeling. This is where I buy a THead
(for instance) CPU for my SoC, including all its vendor extensions,
and do nothing but change the mvendorid to my own. If this is a thing,
then the vendor extensions basically have to be a single global
namespace in hwprobe (sigh).

I do like Charlie's idea of at least letting vendors allocate a key at
a time, eg HWPROBE_THEAD_EXT_0, rather than racing to allocate a bit
at a time in a key like HWPROBE_VENDOR_EXT_0. That gives it some
semblance of organization, and still gives us a chance of a
cleanup/deprecation path for vendors that stop producing chips.
-Evan
Charlie Jenkins April 17, 2024, 10:02 p.m. UTC | #18
On Wed, Apr 17, 2024 at 09:02:05AM -0700, Evan Green wrote:
> On Tue, Apr 16, 2024 at 9:25 PM Charlie Jenkins <charlie@rivosinc.com> wrote:
> >
> > On Tue, Apr 16, 2024 at 08:36:33AM +0100, Conor Dooley wrote:
> > > On Mon, Apr 15, 2024 at 08:34:05PM -0700, Charlie Jenkins wrote:
> > > > On Sat, Apr 13, 2024 at 12:40:26AM +0100, Conor Dooley wrote:
> > > > > On Fri, Apr 12, 2024 at 02:31:42PM -0700, Charlie Jenkins wrote:
> > > > > > On Fri, Apr 12, 2024 at 10:27:47PM +0100, Conor Dooley wrote:
> > > > > > > On Fri, Apr 12, 2024 at 01:48:46PM -0700, Charlie Jenkins wrote:
> > > > > > > > On Fri, Apr 12, 2024 at 07:47:48PM +0100, Conor Dooley wrote:
> > > > > > > > > On Fri, Apr 12, 2024 at 10:12:20AM -0700, Charlie Jenkins wrote:
> > > > >
> > > > > > > > > > This is already falling back on the boot CPU, but that is not a solution
> > > > > > > > > > that scales. Even though all systems currently have homogenous
> > > > > > > > > > marchid/mvendorid I am hesitant to assert that all systems are
> > > > > > > > > > homogenous without providing an option to override this.
> > > > > > > > >
> > > > > > > > > There are already is an option. Use the non-deprecated property in your
> > > > > > > > > new system for describing what extesions you support. We don't need to
> > > > > > > > > add any more properties (for now at least).
> > > > > > > >
> > > > > > > > The issue is that it is not possible to know which vendor extensions are
> > > > > > > > associated with a vendor. That requires a global namespace where each
> > > > > > > > extension can be looked up in a table. I have opted to have a
> > > > > > > > vendor-specific namespace so that vendors don't have to worry about
> > > > > > > > stepping on other vendor's toes (or the other way around). In order to
> > > > > > > > support that, the vendorid of the hart needs to be known prior.
> > > > > > >
> > > > > > > Nah, I think you're mixing up something like hwprobe and having
> > > > > > > namespaces there with needing namespacing on the devicetree probing side
> > > > > > > too. You don't need any vendor namespacing, it's perfectly fine (IMO)
> > > > > > > for a vendor to implement someone else's extension and I think we should
> > > > > > > allow probing any vendors extension on any CPU.
> > > > > >
> > > > > > I am not mixing it up. Sure a vendor can implement somebody else's
> > > > > > extension, they just need to add it to their namespace too.
> > > > >
> > > > > I didn't mean that you were mixing up how your implementation worked, my
> > > > > point was that you're mixing up the hwprobe stuff which may need
> > > > > namespacing for $a{b,p}i_reason and probing from DT which does not.
> > > > > I don't think that the kernel should need to be changed at all if
> > > > > someone shows up and implements another vendor's extension - we already
> > > > > have far too many kernel changes required to display support for
> > > > > extensions and I don't welcome potential for more.
> > > >
> > > > Yes I understand where you are coming from. We do not want it to require
> > > > very many changes to add an extension. With this framework, there are
> > > > the same number of changes to add a vendor extension as there is to add
> > > > a standard extension.
> > >
> > > No, it is actually subtly different. Even if the kernel already supports
> > > the extension, it needs to be patched for each vendor
> > >
> > > > There is the upfront cost of creating the struct
> > > > for the first vendor extension from a vendor, but after that the
> > > > extension only needs to be added to the associated vendor's file (I am
> > > > extracting this out to a vendor file in the next version). This is also
> > > > a very easy task since the fields from a different vendor can be copied
> > > > and adapted.
> > > >
> > > > > Another thing I just thought of was systems where the SoC vendor
> > > > > implements some extension that gets communicated in the ISA string but
> > > > > is not the vendor in mvendorid in their various CPUs. I wouldn't want to
> > > > > see several different entries in structs (or several different hwprobe
> > > > > keys, but that's another story) for this situation because you're only
> > > > > allowing probing what's in the struct matching the vendorid.
> > > >
> > > > Since the isa string is a per-hart field, the vendor associated with the
> > > > hart will be used.
> > >
> > > I don't know if you just didn't really read what I said or didn't
> > > understand it, but this response doesn't address my comment.
> >
> > I read what you said! This question seemed to me as another variant of
> > "what happens when one vendor implements an extension from a different
> > vendor", and since we already discussed that I was trying to figure out
> > what you were actually asking.
> >
> > > Consider SoC vendor S buys CPUs from vendors A & B and asks both of them
> > > to implement Xsjam. The CPUs are have the vendorid of either A or B,
> > > depending on who made it. This scenario should not result in two
> > > different hwprobe keys nor two different in-kernel riscv_has_vendor_ext()
> > > checks to see if the extension is supported. *If* the extension is vendor
> > > namespaced, it should be to the SoC vendor whose extension it is, not
> > > the individual CPU vendors that implemented it.
> > >
> > > Additionally, consider that CPUs from both vendors are in the same SoC
> > > and all CPUs support Xsjam. Linux only supports homogeneous extensions
> > > so we should be able to detect that all CPUs support the extension and
> > > use it in a driver etc, but that's either not going to work (or be
> > > difficult to orchestrate) with different mappings per CPU vendor. I saw
> > > your v2 cover letter, in which you said:
> > >   Only patch vendor extension if all harts are associated with the same
> > >   vendor. This is the best chance the kernel has for working properly if
> > >   there are multiple vendors.
> > > I don't think that level of paranoia is required: if firmware tells us
> > > that an extension is supported, then we can trust that those extensions
> > > have been implemented correctly. If the fear of implementation bugs is
> > > what is driving the namespacing that you've gone for, I don't think that
> > > it is required and we can simplify things, with the per-vendor structs
> > > being the vendor of the extension (so SoC vendor S in my example), not
> > > A and B who are the vendors of the CPU IP.
> > >
> > > Thanks,
> > > Conor.
> > >
> >
> > Thank you for expanding upon this idea further. This solution of
> > indexing the extensions based on the vendor who proposed them does make
> > a lot of sense. There are some key differences here of note. When
> > vendors are able to mix vendor extensions, defining a bitmask that
> > contains all of the vendor extensions gets a bit messier. I see two
> > possible solutions.
> >
> > 1. Vendor keys cannot overlap between vendors. A set bit in the bitmask
> > is associated with exactly one extension.
> >
> > 2. Vendor keys can overlap between vendors. There is a vendor bitmask
> > per vendor. When setting/checking a vendor extension, first index into
> > the vendor extension bitmask with the vendor associated with the
> > extension and then with the key of the vendor extension.
> >
> > A third option would be to use the standard extension framework. This
> > causes the standard extension list to become populated with extensions
> > that most harts will never implement so I am opposed to that.
> >
> > This problem carries over into hwprobe since the schemes proposed by
> > Evan and I both rely on the mvendorid of harts associated with the
> > cpumask. To have this level of support in hwprobe for SoCs with a mix of
> > vendors but the same extensions I again see two options:
> >
> > 1. Vendor keys cannot overlap between vendors. A set bit in the bitmask
> > is associated with exactly one extension. This bitmask would be returned
> > by the vendor extension hwprobe key.
> >
> > 2. Vendor keys can overlap between vendors. There is an hwprobe key per
> > vendor. Automatic resolution of the vendor doesn't work because the
> > vendor-specific feature being requested (extensions in the case) may be
> > of a vendor that is different than the hart's vendor, in otherwords
> > there are two variables necessary: the vendor and a way to ask hwprobe
> > for a list of the vendor extensions. With hwprobe there is only the
> > "key" that can be used to encode these variables simultaneously. We
> > could have something like a HWPROBE_THEAD_EXT_0 key that would return
> > all thead vendor extensions supported by the harts corresponding to the
> > cpumask.
> 
> I was a big proponent of the vendor namespacing in hwprobe, as I liked
> the tidiness of it, and felt it could handle most cases (including
> mix-n-matching multiple mvendorids in a single SoC). However my
> balloon lost its air after chatting with Palmer, as there's one case
> it really can't handle: white labeling. This is where I buy a THead
> (for instance) CPU for my SoC, including all its vendor extensions,
> and do nothing but change the mvendorid to my own. If this is a thing,
> then the vendor extensions basically have to be a single global
> namespace in hwprobe (sigh).
> 
> I do like Charlie's idea of at least letting vendors allocate a key at
> a time, eg HWPROBE_THEAD_EXT_0, rather than racing to allocate a bit
> at a time in a key like HWPROBE_VENDOR_EXT_0. That gives it some
> semblance of organization, and still gives us a chance of a
> cleanup/deprecation path for vendors that stop producing chips.
> -Evan

Okay I will send a v3 following that method!

- Charlie
diff mbox series

Patch

diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 6e68f8dff76b..0fab508a65b3 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -370,6 +370,8 @@  static inline int sbi_remote_fence_i(const struct cpumask *cpu_mask) { return -1
 static inline void sbi_init(void) {}
 #endif /* CONFIG_RISCV_SBI */
 
+unsigned long riscv_get_mvendorid(void);
+unsigned long riscv_get_marchid(void);
 unsigned long riscv_cached_mvendorid(unsigned int cpu_id);
 unsigned long riscv_cached_marchid(unsigned int cpu_id);
 unsigned long riscv_cached_mimpid(unsigned int cpu_id);
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index d11d6320fb0d..08319a819f32 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -139,6 +139,26 @@  int riscv_of_parent_hartid(struct device_node *node, unsigned long *hartid)
 	return -1;
 }
 
+unsigned long __init riscv_get_marchid(void)
+{
+#if IS_ENABLED(CONFIG_RISCV_SBI)
+	return sbi_spec_is_0_1() ? 0 : sbi_get_marchid();
+#elif IS_ENABLED(CONFIG_RISCV_M_MODE)
+	return csr_read(CSR_MARCHID);
+#endif
+	return 0;
+}
+
+unsigned long __init riscv_get_mvendorid(void)
+{
+#if IS_ENABLED(CONFIG_RISCV_SBI)
+	return sbi_spec_is_0_1() ? 0 : sbi_get_mvendorid();
+#elif IS_ENABLED(CONFIG_RISCV_M_MODE)
+	return csr_read(CSR_MVENDORID);
+#endif
+	return 0;
+}
+
 DEFINE_PER_CPU(struct riscv_cpuinfo, riscv_cpuinfo);
 
 unsigned long riscv_cached_mvendorid(unsigned int cpu_id)
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 3ed2359eae35..cd156adbeb66 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -490,6 +490,8 @@  static void __init riscv_fill_hwcap_from_isa_string(unsigned long *isa2hwcap)
 	struct acpi_table_header *rhct;
 	acpi_status status;
 	unsigned int cpu;
+	u64 boot_vendorid;
+	u64 boot_archid;
 
 	if (!acpi_disabled) {
 		status = acpi_get_table(ACPI_SIG_RHCT, 0, &rhct);
@@ -497,9 +499,14 @@  static void __init riscv_fill_hwcap_from_isa_string(unsigned long *isa2hwcap)
 			return;
 	}
 
+	boot_vendorid = riscv_get_mvendorid();
+	boot_archid = riscv_get_marchid();
+
 	for_each_possible_cpu(cpu) {
 		struct riscv_isainfo *isainfo = &hart_isa[cpu];
 		unsigned long this_hwcap = 0;
+		u64 this_vendorid;
+		u64 this_archid;
 
 		if (acpi_disabled) {
 			node = of_cpu_device_node_get(cpu);
@@ -514,12 +521,23 @@  static void __init riscv_fill_hwcap_from_isa_string(unsigned long *isa2hwcap)
 				pr_warn("Unable to find \"riscv,isa\" devicetree entry\n");
 				continue;
 			}
+			if (of_property_read_u64(node, "riscv,vendorid", &this_vendorid) < 0) {
+				pr_warn("Unable to find \"riscv,vendorid\" devicetree entry, using boot hart mvendorid instead\n");
+				this_vendorid = boot_vendorid;
+			}
+
+			if (of_property_read_u64(node, "riscv,archid", &this_archid) < 0) {
+				pr_warn("Unable to find \"riscv,vendorid\" devicetree entry, using boot hart marchid instead\n");
+				this_archid = boot_archid;
+			}
 		} else {
 			rc = acpi_get_riscv_isa(rhct, cpu, &isa);
 			if (rc < 0) {
 				pr_warn("Unable to get ISA for the hart - %d\n", cpu);
 				continue;
 			}
+			this_vendorid = boot_vendorid;
+			this_archid = boot_archid;
 		}
 
 		riscv_parse_isa_string(&this_hwcap, isainfo, isa2hwcap, isa);
@@ -544,8 +562,8 @@  static void __init riscv_fill_hwcap_from_isa_string(unsigned long *isa2hwcap)
 		 * CPU cores with the ratified spec will contain non-zero
 		 * marchid.
 		 */
-		if (acpi_disabled && riscv_cached_mvendorid(cpu) == THEAD_VENDOR_ID &&
-		    riscv_cached_marchid(cpu) == 0x0) {
+		if (acpi_disabled && this_vendorid == THEAD_VENDOR_ID &&
+		    this_archid == 0x0) {
 			this_hwcap &= ~isa2hwcap[RISCV_ISA_EXT_v];
 			clear_bit(RISCV_ISA_EXT_v, isainfo->isa);
 		}