diff mbox series

[v1,1/2] RISC-V: skip parsing multi-letter extensions starting with caps

Message ID 20230426-devalue-enlarging-afb4fa1bb247@wendy (mailing list archive)
State Rejected
Headers show
Series Handle multi-letter extensions starting with caps in riscv,isa | expand

Checks

Context Check Description
conchuod/cover_letter success Series has a cover letter
conchuod/tree_selection success Guessed tree name to be fixes at HEAD 1b50f956c8fe
conchuod/fixes_present success Fixes tag present in non-next series
conchuod/maintainers_pattern success MAINTAINERS pattern errors before the patch: 1 and now 1
conchuod/verify_signedoff success Signed-off-by tag matches author and committer
conchuod/kdoc success Errors and warnings before: 0 this patch: 0
conchuod/build_rv64_clang_allmodconfig success Errors and warnings before: 18 this patch: 18
conchuod/module_param success Was 0 now: 0
conchuod/build_rv64_gcc_allmodconfig success Errors and warnings before: 18 this patch: 18
conchuod/build_rv32_defconfig success Build OK
conchuod/dtb_warn_rv64 success Errors and warnings before: 3 this patch: 3
conchuod/header_inline success No static functions without inline keyword in header files
conchuod/checkpatch success total: 0 errors, 0 warnings, 0 checks, 36 lines checked
conchuod/source_inline success Was 0 now: 0
conchuod/build_rv64_nommu_k210_defconfig success Build OK
conchuod/verify_fixes success Fixes tag looks correct
conchuod/build_rv64_nommu_virt_defconfig success Build OK

Commit Message

Conor Dooley April 26, 2023, 10:43 a.m. UTC
Yangyu Chen reported that if an multi-letter extension begins with a
capital letter the parser will treat the remainder of that multi-letter
extension as single-letter extensions.

Certain versions of rocket-chip will export devicetree containing
rv64ima_Zifencei, which is parsed by the kernel as rv64imafc.

While capital letters in riscv,isa are invalid and the validation of
devicetree's isn't the kernel's job, we should behave more gracefully
here. Rather than abort parsing on meeting a capital letter, mark the
extension as an error & allow the parser to skip ahead to the next
extension.

Reported-by: Yangyu Chen <cyy@cyyself.name>
Link: https://lore.kernel.org/all/tencent_1647475C9618C390BEC601BE2CC1206D0C07@qq.com/
Fixes: 2a31c54be097 ("RISC-V: Minimal parser for "riscv, isa" strings")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
---
 arch/riscv/kernel/cpufeature.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

Comments

Andrew Jones April 26, 2023, 12:18 p.m. UTC | #1
On Wed, Apr 26, 2023 at 11:43:24AM +0100, Conor Dooley wrote:
> Yangyu Chen reported that if an multi-letter extension begins with a
> capital letter the parser will treat the remainder of that multi-letter
> extension as single-letter extensions.

I think the problem is that the parser doesn't completely abort when
it sees something it doesn't understand. Continuing is risky since
it may be possible to compose an invalid string that gets the parser
to run off the rails.

How about completely aborting, noisily, when the string doesn't match
expectations, falling back to a default string such as rv64ima instead.
That also ought to get faster corrections of device trees.

> 
> Certain versions of rocket-chip will export devicetree containing
> rv64ima_Zifencei, which is parsed by the kernel as rv64imafc.
> 
> While capital letters in riscv,isa are invalid and the validation of
> devicetree's isn't the kernel's job, we should behave more gracefully
> here. Rather than abort parsing on meeting a capital letter, mark the
> extension as an error & allow the parser to skip ahead to the next
> extension.
> 
> Reported-by: Yangyu Chen <cyy@cyyself.name>
> Link: https://lore.kernel.org/all/tencent_1647475C9618C390BEC601BE2CC1206D0C07@qq.com/
> Fixes: 2a31c54be097 ("RISC-V: Minimal parser for "riscv, isa" strings")
> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
> ---
>  arch/riscv/kernel/cpufeature.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 52585e088873..93850540b0b4 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -142,6 +142,10 @@ void __init riscv_fill_hwcap(void)
>  			const char *ext_end = isa;
>  			bool ext_long = false, ext_err = false;
>  
> +			if (unlikely(!islower(*ext))) {
> +				ext_err = true;
> +			}

Can drop the {}

> +
>  			switch (*ext) {
>  			case 's':
>  				/**
> @@ -156,6 +160,15 @@ void __init riscv_fill_hwcap(void)
>  					break;
>  				}
>  				fallthrough;
> +			case 'S':
> +			case 'X':
> +			case 'Z':
> +				/*
> +				 * As the riscv,isa string must be lower-case,
> +				 * S, X and Z are not valid characters. Parse
> +				 * the invalid extension anyway, to skip ahead
> +				 * to the next valid one.
> +				 */
>  			case 'x':
>  			case 'z':
>  				ext_long = true;
> @@ -185,10 +198,8 @@ void __init riscv_fill_hwcap(void)
>  				++ext_end;
>  				break;
>  			default:
> -				if (unlikely(!islower(*ext))) {
> -					ext_err = true;
> +				if (unlikely(ext_err))
>  					break;
> -				}
>  				/* Find next extension */
>  				if (!isdigit(*isa))
>  					break;
> -- 
> 2.39.2
>

Thanks,
drew
Conor Dooley April 26, 2023, 12:47 p.m. UTC | #2
On Wed, Apr 26, 2023 at 02:18:52PM +0200, Andrew Jones wrote:
> On Wed, Apr 26, 2023 at 11:43:24AM +0100, Conor Dooley wrote:
> > Yangyu Chen reported that if an multi-letter extension begins with a
> > capital letter the parser will treat the remainder of that multi-letter
> > extension as single-letter extensions.
> 
> I think the problem is that the parser doesn't completely abort when
> it sees something it doesn't understand. Continuing is risky since
> it may be possible to compose an invalid string that gets the parser
> to run off the rails.

Usually I am of the opinion that we should not seek the validate the dt
in the kernel, since there are tools for doing so *cough* dt-validate
*cough*. This one seemed like low hanging fruit though, since the parser
handles having capital letters in any of the other places after the
rv##, but falls over pretty badly for this particular issue.

In general, I don't think we need to be concerned about anything that
fails dt-validate though, you kinda need to trust that that is correct.
I'd argue that we might even do too much validation in the parser at
present.
Is there some attack vector, or ACPI related consideration, that I am
unaware of that makes this risky?

> How about completely aborting, noisily, when the string doesn't match
> expectations, falling back to a default string such as rv64ima instead.
> That also ought to get faster corrections of device trees.

I did this first actually, but I was afraid that it would cause
regressions?

If you have riscv,isa = "rv64imafdc_Zifencei_zicbom", yes that is
invalid and dt-validate would have told you so, but at present that
would be parsed as "rv64imafdc_zicbom" which is a perfect description of
the hardware in question (since the meaning of i was set before RVI made
a hames of things).

So that's why I opted to not do some sort of pr_err/BUG()/WARN() and
try to keep processing the string. I'm happy to abort entirely on
reaching a capital if people feel there's unlikely to be a fallout from
that.

Cheers,
Conor.
Andrew Jones April 26, 2023, 1:08 p.m. UTC | #3
On Wed, Apr 26, 2023 at 01:47:39PM +0100, Conor Dooley wrote:
> On Wed, Apr 26, 2023 at 02:18:52PM +0200, Andrew Jones wrote:
> > On Wed, Apr 26, 2023 at 11:43:24AM +0100, Conor Dooley wrote:
> > > Yangyu Chen reported that if an multi-letter extension begins with a
> > > capital letter the parser will treat the remainder of that multi-letter
> > > extension as single-letter extensions.
> > 
> > I think the problem is that the parser doesn't completely abort when
> > it sees something it doesn't understand. Continuing is risky since
> > it may be possible to compose an invalid string that gets the parser
> > to run off the rails.
> 
> Usually I am of the opinion that we should not seek the validate the dt
> in the kernel, since there are tools for doing so *cough* dt-validate
> *cough*. This one seemed like low hanging fruit though, since the parser
> handles having capital letters in any of the other places after the
> rv##, but falls over pretty badly for this particular issue.
> 
> In general, I don't think we need to be concerned about anything that
> fails dt-validate though, you kinda need to trust that that is correct.
> I'd argue that we might even do too much validation in the parser at
> present.
> Is there some attack vector, or ACPI related consideration, that I am
> unaware of that makes this risky?

C language + string processing == potential attack vector

> 
> > How about completely aborting, noisily, when the string doesn't match
> > expectations, falling back to a default string such as rv64ima instead.
> > That also ought to get faster corrections of device trees.
> 
> I did this first actually, but I was afraid that it would cause
> regressions?
> 
> If you have riscv,isa = "rv64imafdc_Zifencei_zicbom", yes that is
> invalid and dt-validate would have told you so, but at present that
> would be parsed as "rv64imafdc_zicbom" which is a perfect description of
> the hardware in question (since the meaning of i was set before RVI made
> a hames of things).
> 
> So that's why I opted to not do some sort of pr_err/BUG()/WARN() and
> try to keep processing the string. I'm happy to abort entirely on
> reaching a capital if people feel there's unlikely to be a fallout from
> that.

There might be fallout, but the kernel needs to defend itself. IMO, if
the kernel doesn't know how to parse something, then it should stop
trying to immediately, either with a BUG(), refusing to accept any
part of it, by fallbacking back to a default, or by only accepting what
it believes it parsed correctly.

The third option is probably a reasonable choice in this case.

Thanks,
drew
Andrew Jones April 26, 2023, 1:54 p.m. UTC | #4
On Wed, Apr 26, 2023 at 03:08:25PM +0200, Andrew Jones wrote:
> On Wed, Apr 26, 2023 at 01:47:39PM +0100, Conor Dooley wrote:
> > On Wed, Apr 26, 2023 at 02:18:52PM +0200, Andrew Jones wrote:
> > > On Wed, Apr 26, 2023 at 11:43:24AM +0100, Conor Dooley wrote:
> > > > Yangyu Chen reported that if an multi-letter extension begins with a
> > > > capital letter the parser will treat the remainder of that multi-letter
> > > > extension as single-letter extensions.
> > > 
> > > I think the problem is that the parser doesn't completely abort when
> > > it sees something it doesn't understand. Continuing is risky since
> > > it may be possible to compose an invalid string that gets the parser
> > > to run off the rails.
> > 
> > Usually I am of the opinion that we should not seek the validate the dt
> > in the kernel, since there are tools for doing so *cough* dt-validate
> > *cough*. This one seemed like low hanging fruit though, since the parser
> > handles having capital letters in any of the other places after the
> > rv##, but falls over pretty badly for this particular issue.
> > 
> > In general, I don't think we need to be concerned about anything that
> > fails dt-validate though, you kinda need to trust that that is correct.
> > I'd argue that we might even do too much validation in the parser at
> > present.
> > Is there some attack vector, or ACPI related consideration, that I am
> > unaware of that makes this risky?

A bit unrelated to this, but your mention of ACPI made me go look at the
approved ECR[1] again for the ISA string. It says "Null-terminated ASCII
Instruction Set Architecture (ISA) string for this hart. The format of the
ISA string is defined in the RISC-V unprivileged specification." I suppose
we can still add additional requirements to an ACPI ISA string which the
Linux kernel will parse, but it'll be odd to point people at the DT
binding to do that. Maybe we should consider making the parser more
complete, possibly by importing it from some reference implementation or
something.

[1] https://drive.google.com/file/d/1nP3nFiH4jkPMp6COOxP6123DCZKR-tia/view

Thanks,
drew

> 
> C language + string processing == potential attack vector
> 
> > 
> > > How about completely aborting, noisily, when the string doesn't match
> > > expectations, falling back to a default string such as rv64ima instead.
> > > That also ought to get faster corrections of device trees.
> > 
> > I did this first actually, but I was afraid that it would cause
> > regressions?
> > 
> > If you have riscv,isa = "rv64imafdc_Zifencei_zicbom", yes that is
> > invalid and dt-validate would have told you so, but at present that
> > would be parsed as "rv64imafdc_zicbom" which is a perfect description of
> > the hardware in question (since the meaning of i was set before RVI made
> > a hames of things).
> > 
> > So that's why I opted to not do some sort of pr_err/BUG()/WARN() and
> > try to keep processing the string. I'm happy to abort entirely on
> > reaching a capital if people feel there's unlikely to be a fallout from
> > that.
> 
> There might be fallout, but the kernel needs to defend itself. IMO, if
> the kernel doesn't know how to parse something, then it should stop
> trying to immediately, either with a BUG(), refusing to accept any
> part of it, by fallbacking back to a default, or by only accepting what
> it believes it parsed correctly.
> 
> The third option is probably a reasonable choice in this case.
> 
> Thanks,
> drew
Conor Dooley April 26, 2023, 1:58 p.m. UTC | #5
On Wed, Apr 26, 2023 at 03:08:25PM +0200, Andrew Jones wrote:
> On Wed, Apr 26, 2023 at 01:47:39PM +0100, Conor Dooley wrote:
> > On Wed, Apr 26, 2023 at 02:18:52PM +0200, Andrew Jones wrote:
> > > On Wed, Apr 26, 2023 at 11:43:24AM +0100, Conor Dooley wrote:
> > > > Yangyu Chen reported that if an multi-letter extension begins with a
> > > > capital letter the parser will treat the remainder of that multi-letter
> > > > extension as single-letter extensions.
> > > 
> > > I think the problem is that the parser doesn't completely abort when
> > > it sees something it doesn't understand. Continuing is risky since
> > > it may be possible to compose an invalid string that gets the parser
> > > to run off the rails.
> > 
> > Usually I am of the opinion that we should not seek the validate the dt
> > in the kernel, since there are tools for doing so *cough* dt-validate
> > *cough*. This one seemed like low hanging fruit though, since the parser
> > handles having capital letters in any of the other places after the
> > rv##, but falls over pretty badly for this particular issue.
> > 
> > In general, I don't think we need to be concerned about anything that
> > fails dt-validate though, you kinda need to trust that that is correct.
> > I'd argue that we might even do too much validation in the parser at
> > present.
> > Is there some attack vector, or ACPI related consideration, that I am
> > unaware of that makes this risky?
> 
> C language + string processing == potential attack vector

Right. I thought there was some specific scenario that you had in mind,
rather than the "obvious" "parsing strings is bad".
What I was wondering is whether the devicetree is an attack vector you
actually have to care about? I had thought it wasn't, hence asking.

> > > How about completely aborting, noisily, when the string doesn't match
> > > expectations, falling back to a default string such as rv64ima instead.
> > > That also ought to get faster corrections of device trees.
> > 
> > I did this first actually, but I was afraid that it would cause
> > regressions?
> > 
> > If you have riscv,isa = "rv64imafdc_Zifencei_zicbom", yes that is
> > invalid and dt-validate would have told you so, but at present that
> > would be parsed as "rv64imafdc_zicbom" which is a perfect description of
> > the hardware in question (since the meaning of i was set before RVI made
> > a hames of things).

After thinking about it a bit cycling home, I don't actually think that
this would be a regression. If your dt is not valid, then that's your
problem, not ours :)
Valid dt will be parsed correctly before and after such a change, so I
think that that is actually okay.
The dt-binding exists for a reason, and can be pointed to if anyone
claims this is a regression I think.

> > So that's why I opted to not do some sort of pr_err/BUG()/WARN() and
> > try to keep processing the string. I'm happy to abort entirely on
> > reaching a capital if people feel there's unlikely to be a fallout from
> > that.
> 
> There might be fallout, but the kernel needs to defend itself. IMO, if
> the kernel doesn't know how to parse something, then it should stop
> trying to immediately, either with a BUG(), refusing to accept any
> part of it, by fallbacking back to a default, or by only accepting what
> it believes it parsed correctly.
> 
> The third option is probably a reasonable choice in this case.

My patch could be interpreted as meeting the criteria for option 3.
I think you instead mean "stop parsing at that point & only report the
extensions seen prior to the first bad one"?

Cheers,
Conor.
Andrew Jones April 26, 2023, 2:27 p.m. UTC | #6
On Wed, Apr 26, 2023 at 02:58:45PM +0100, Conor Dooley wrote:
> On Wed, Apr 26, 2023 at 03:08:25PM +0200, Andrew Jones wrote:
> > On Wed, Apr 26, 2023 at 01:47:39PM +0100, Conor Dooley wrote:
> > > On Wed, Apr 26, 2023 at 02:18:52PM +0200, Andrew Jones wrote:
> > > > On Wed, Apr 26, 2023 at 11:43:24AM +0100, Conor Dooley wrote:
> > > > > Yangyu Chen reported that if an multi-letter extension begins with a
> > > > > capital letter the parser will treat the remainder of that multi-letter
> > > > > extension as single-letter extensions.
> > > > 
> > > > I think the problem is that the parser doesn't completely abort when
> > > > it sees something it doesn't understand. Continuing is risky since
> > > > it may be possible to compose an invalid string that gets the parser
> > > > to run off the rails.
> > > 
> > > Usually I am of the opinion that we should not seek the validate the dt
> > > in the kernel, since there are tools for doing so *cough* dt-validate
> > > *cough*. This one seemed like low hanging fruit though, since the parser
> > > handles having capital letters in any of the other places after the
> > > rv##, but falls over pretty badly for this particular issue.
> > > 
> > > In general, I don't think we need to be concerned about anything that
> > > fails dt-validate though, you kinda need to trust that that is correct.
> > > I'd argue that we might even do too much validation in the parser at
> > > present.
> > > Is there some attack vector, or ACPI related consideration, that I am
> > > unaware of that makes this risky?
> > 
> > C language + string processing == potential attack vector
> 
> Right. I thought there was some specific scenario that you had in mind,
> rather than the "obvious" "parsing strings is bad".

Yup, I only pointed out the obvious since it's always possible, at least
for me, to lose sight of the forest for the trees.

> What I was wondering is whether the devicetree is an attack vector you
> actually have to care about? I had thought it wasn't, hence asking.

Nope, I haven't put any thought into this potential attack vector beyond
this discussion.

> 
> > > > How about completely aborting, noisily, when the string doesn't match
> > > > expectations, falling back to a default string such as rv64ima instead.
> > > > That also ought to get faster corrections of device trees.
> > > 
> > > I did this first actually, but I was afraid that it would cause
> > > regressions?
> > > 
> > > If you have riscv,isa = "rv64imafdc_Zifencei_zicbom", yes that is
> > > invalid and dt-validate would have told you so, but at present that
> > > would be parsed as "rv64imafdc_zicbom" which is a perfect description of
> > > the hardware in question (since the meaning of i was set before RVI made
> > > a hames of things).
> 
> After thinking about it a bit cycling home, I don't actually think that
> this would be a regression. If your dt is not valid, then that's your
> problem, not ours :)
> Valid dt will be parsed correctly before and after such a change, so I
> think that that is actually okay.
> The dt-binding exists for a reason, and can be pointed to if anyone
> claims this is a regression I think.

I agree.

> 
> > > So that's why I opted to not do some sort of pr_err/BUG()/WARN() and
> > > try to keep processing the string. I'm happy to abort entirely on
> > > reaching a capital if people feel there's unlikely to be a fallout from
> > > that.
> > 
> > There might be fallout, but the kernel needs to defend itself. IMO, if
> > the kernel doesn't know how to parse something, then it should stop
> > trying to immediately, either with a BUG(), refusing to accept any
> > part of it, by fallbacking back to a default, or by only accepting what
> > it believes it parsed correctly.
> > 
> > The third option is probably a reasonable choice in this case.
> 
> My patch could be interpreted as meeting the criteria for option 3.
> I think you instead mean "stop parsing at that point & only report the
> extensions seen prior to the first bad one"?

Right.

Thanks,
drew
Conor Dooley April 26, 2023, 2:37 p.m. UTC | #7
On Wed, Apr 26, 2023 at 03:54:55PM +0200, Andrew Jones wrote:
> On Wed, Apr 26, 2023 at 03:08:25PM +0200, Andrew Jones wrote:
> > On Wed, Apr 26, 2023 at 01:47:39PM +0100, Conor Dooley wrote:
> > > On Wed, Apr 26, 2023 at 02:18:52PM +0200, Andrew Jones wrote:
> > > > On Wed, Apr 26, 2023 at 11:43:24AM +0100, Conor Dooley wrote:
> > > > > Yangyu Chen reported that if an multi-letter extension begins with a
> > > > > capital letter the parser will treat the remainder of that multi-letter
> > > > > extension as single-letter extensions.
> > > > 
> > > > I think the problem is that the parser doesn't completely abort when
> > > > it sees something it doesn't understand. Continuing is risky since
> > > > it may be possible to compose an invalid string that gets the parser
> > > > to run off the rails.
> > > 
> > > Usually I am of the opinion that we should not seek the validate the dt
> > > in the kernel, since there are tools for doing so *cough* dt-validate
> > > *cough*. This one seemed like low hanging fruit though, since the parser
> > > handles having capital letters in any of the other places after the
> > > rv##, but falls over pretty badly for this particular issue.
> > > 
> > > In general, I don't think we need to be concerned about anything that
> > > fails dt-validate though, you kinda need to trust that that is correct.
> > > I'd argue that we might even do too much validation in the parser at
> > > present.
> > > Is there some attack vector, or ACPI related consideration, that I am
> > > unaware of that makes this risky?
> 
> A bit unrelated to this, but your mention of ACPI made me go look at the
> approved ECR[1] again for the ISA string. It says "Null-terminated ASCII
> Instruction Set Architecture (ISA) string for this hart. The format of the
> ISA string is defined in the RISC-V unprivileged specification." I suppose
> we can still add additional requirements to an ACPI ISA string which the
> Linux kernel will parse, but it'll be odd to point people at the DT
> binding to do that. Maybe we should consider making the parser more
> complete, possibly by importing it from some reference implementation or
> something.

Heh, I wonder are we heading for some divergence here then. riscv,isa in
a DT is explicitly *not* a match for that due to the
backwards-incompatible changes made by RVI to extension definitions
since riscv,isa was added to the dt-binding. Clarifying that one is the
next patch in my todo list..

ACPI naively saying "it matches the spec" is asking for trouble, since
there does not actually appear to be any sort of clarification about
which *version* of the spec that may be. At least in the dt-binding, we
have a format there, what happens to the ACPI spec if RVI decides that -
is a suitable alternative to _ in some future edition? I don't think
such a thing is all that likely, but surely you'd like to insulate the
ABI from that sort of eventuality?

Perhaps the thing to do is to actually take Yangyu's first patch and my
second one, since the problem with backwards compatibility doesn't stop
the kernel from being more permissive?

Cheers,
Conor.

> 
> [1] https://drive.google.com/file/d/1nP3nFiH4jkPMp6COOxP6123DCZKR-tia/view
> 
> Thanks,
> drew
> 
> > 
> > C language + string processing == potential attack vector
> > 
> > > 
> > > > How about completely aborting, noisily, when the string doesn't match
> > > > expectations, falling back to a default string such as rv64ima instead.
> > > > That also ought to get faster corrections of device trees.
> > > 
> > > I did this first actually, but I was afraid that it would cause
> > > regressions?
> > > 
> > > If you have riscv,isa = "rv64imafdc_Zifencei_zicbom", yes that is
> > > invalid and dt-validate would have told you so, but at present that
> > > would be parsed as "rv64imafdc_zicbom" which is a perfect description of
> > > the hardware in question (since the meaning of i was set before RVI made
> > > a hames of things).
> > > 
> > > So that's why I opted to not do some sort of pr_err/BUG()/WARN() and
> > > try to keep processing the string. I'm happy to abort entirely on
> > > reaching a capital if people feel there's unlikely to be a fallout from
> > > that.
> > 
> > There might be fallout, but the kernel needs to defend itself. IMO, if
> > the kernel doesn't know how to parse something, then it should stop
> > trying to immediately, either with a BUG(), refusing to accept any
> > part of it, by fallbacking back to a default, or by only accepting what
> > it believes it parsed correctly.
> > 
> > The third option is probably a reasonable choice in this case.
> > 
> > Thanks,
> > drew
Andrew Jones April 26, 2023, 3:01 p.m. UTC | #8
On Wed, Apr 26, 2023 at 03:37:58PM +0100, Conor Dooley wrote:
> On Wed, Apr 26, 2023 at 03:54:55PM +0200, Andrew Jones wrote:
> > On Wed, Apr 26, 2023 at 03:08:25PM +0200, Andrew Jones wrote:
> > > On Wed, Apr 26, 2023 at 01:47:39PM +0100, Conor Dooley wrote:
> > > > On Wed, Apr 26, 2023 at 02:18:52PM +0200, Andrew Jones wrote:
> > > > > On Wed, Apr 26, 2023 at 11:43:24AM +0100, Conor Dooley wrote:
> > > > > > Yangyu Chen reported that if an multi-letter extension begins with a
> > > > > > capital letter the parser will treat the remainder of that multi-letter
> > > > > > extension as single-letter extensions.
> > > > > 
> > > > > I think the problem is that the parser doesn't completely abort when
> > > > > it sees something it doesn't understand. Continuing is risky since
> > > > > it may be possible to compose an invalid string that gets the parser
> > > > > to run off the rails.
> > > > 
> > > > Usually I am of the opinion that we should not seek the validate the dt
> > > > in the kernel, since there are tools for doing so *cough* dt-validate
> > > > *cough*. This one seemed like low hanging fruit though, since the parser
> > > > handles having capital letters in any of the other places after the
> > > > rv##, but falls over pretty badly for this particular issue.
> > > > 
> > > > In general, I don't think we need to be concerned about anything that
> > > > fails dt-validate though, you kinda need to trust that that is correct.
> > > > I'd argue that we might even do too much validation in the parser at
> > > > present.
> > > > Is there some attack vector, or ACPI related consideration, that I am
> > > > unaware of that makes this risky?
> > 
> > A bit unrelated to this, but your mention of ACPI made me go look at the
> > approved ECR[1] again for the ISA string. It says "Null-terminated ASCII
> > Instruction Set Architecture (ISA) string for this hart. The format of the
> > ISA string is defined in the RISC-V unprivileged specification." I suppose
> > we can still add additional requirements to an ACPI ISA string which the
> > Linux kernel will parse, but it'll be odd to point people at the DT
> > binding to do that. Maybe we should consider making the parser more
> > complete, possibly by importing it from some reference implementation or
> > something.
> 
> Heh, I wonder are we heading for some divergence here then. riscv,isa in
> a DT is explicitly *not* a match for that due to the
> backwards-incompatible changes made by RVI to extension definitions
> since riscv,isa was added to the dt-binding. Clarifying that one is the
> next patch in my todo list..
> 
> ACPI naively saying "it matches the spec" is asking for trouble, since
> there does not actually appear to be any sort of clarification about
> which *version* of the spec that may be. At least in the dt-binding, we
> have a format there, what happens to the ACPI spec if RVI decides that -
> is a suitable alternative to _ in some future edition? I don't think
> such a thing is all that likely, but surely you'd like to insulate the
> ABI from that sort of eventuality?

Agreed. I'll raise this concern with the ACPI people.

> 
> Perhaps the thing to do is to actually take Yangyu's first patch and my
> second one, since the problem with backwards compatibility doesn't stop
> the kernel from being more permissive?

I guess so? Every time you wake up and see a parser, you get six more
weeks of Winter.

drew
Yangyu Chen April 26, 2023, 5:11 p.m. UTC | #9
Hi,

On Wed, 26 Apr 2023 15:37:58 +0100, Conor Dooley wrote:
> Perhaps the thing to do is to actually take Yangyu's first patch and my
> second one, since the problem with backwards compatibility doesn't stop
> the kernel from being more permissive?

How about taking my first patch[1] since the ECR[2] mentioned that
the format of the ISA string is defined in the RISC-V unprivileged
specification? However, I think we still need to stop the parser if
some characters that the parser is not able to handle as Andrew Jones
mentioned in the previous mail[3]. In this case, we still need to add
some code to stop parsing if any error happens.

In my humble opinion, backward compatibility can be solved by backports
to LTS kernels. I think the better option is to allow using uppercase
letters in the device-tree document to make the parser coherent with
RISC-V ISA specification but recommend using all lowercase letters for
better backward compatibility.

[1] https://lore.kernel.org/all/tencent_63090269FF399AE30AC774848C344EF2F10A@qq.com/
[2] https://drive.google.com/file/d/1nP3nFiH4jkPMp6COOxP6123DCZKR-tia/view
[3] https://lore.kernel.org/all/d7t6nxmblmyqriogs4bxakpse3qc7pc6cczjnhmkpk4kjwvgcb@3aihh3erdn6p/

Thanks,
Yangyu Chen
Conor Dooley April 26, 2023, 5:47 p.m. UTC | #10
On Thu, Apr 27, 2023 at 01:11:00AM +0800, Yangyu Chen wrote:
> Hi,
> 
> On Wed, 26 Apr 2023 15:37:58 +0100, Conor Dooley wrote:
> > Perhaps the thing to do is to actually take Yangyu's first patch and my
> > second one, since the problem with backwards compatibility doesn't stop
> > the kernel from being more permissive?
> 
> How about taking my first patch[1] since the ECR[2] mentioned that
> the format of the ISA string is defined in the RISC-V unprivileged
> specification?

That is what I suggested, no? Your 1/2 with a revised subject noting
that ACPI may need it, rather than talking about DT. See also my
comments to Drew about the "perils" of letting undefined spec versions
have control over your ABI.

> However, I think we still need to stop the parser if
> some characters that the parser is not able to handle as Andrew Jones
> mentioned in the previous mail[3]. In this case, we still need to add
> some code to stop parsing if any error happens.

Currently it skips extensions that are not valid, but keeps parsing.
Apart from the case where SZX are capital letters it "should" either
parse into something resembling correct or skip. If we start parsing
in a case-invariant manner, I don't see any immediately problem with
what we currently have.

I just don't really get what we need to "protect" the kernel from.
If someone controls the dtb, I think what they do to the isa string is
probably the least of our worries.

> In my humble opinion, backward compatibility can be solved by backports
> to LTS kernels.

The binding might lie in Linux, but that does not mean we can fix the
problem by backporting parser changes to stable. There are other users
and Linux kernels that would pre-date the change that we would be
inflicting this relaxation on for no benefit at all. U-Boot for example
does a case-sensitive comparison.

> I think the better option is to allow using uppercase
> letters in the device-tree document to make the parser coherent with
> RISC-V ISA specification but recommend using all lowercase letters for
> better backward compatibility.

Any addition of uppercase to that binding will get my NAK.
There is no realistic benefit to doing so, only potential for disruption.
DT generators should emit DT that complies with bindings ¯\_(ツ)_/¯.

I'll go take a proper look at your 1/2 from the other day. I had a
comment about it that I didn't leave, but will do so now.

Thanks,
Conor.
diff mbox series

Patch

diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 52585e088873..93850540b0b4 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -142,6 +142,10 @@  void __init riscv_fill_hwcap(void)
 			const char *ext_end = isa;
 			bool ext_long = false, ext_err = false;
 
+			if (unlikely(!islower(*ext))) {
+				ext_err = true;
+			}
+
 			switch (*ext) {
 			case 's':
 				/**
@@ -156,6 +160,15 @@  void __init riscv_fill_hwcap(void)
 					break;
 				}
 				fallthrough;
+			case 'S':
+			case 'X':
+			case 'Z':
+				/*
+				 * As the riscv,isa string must be lower-case,
+				 * S, X and Z are not valid characters. Parse
+				 * the invalid extension anyway, to skip ahead
+				 * to the next valid one.
+				 */
 			case 'x':
 			case 'z':
 				ext_long = true;
@@ -185,10 +198,8 @@  void __init riscv_fill_hwcap(void)
 				++ext_end;
 				break;
 			default:
-				if (unlikely(!islower(*ext))) {
-					ext_err = true;
+				if (unlikely(ext_err))
 					break;
-				}
 				/* Find next extension */
 				if (!isdigit(*isa))
 					break;